Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

Started by Masahiko Sawadaover 7 years ago296 messages
#1Masahiko Sawada
sawada.mshk@gmail.com
4 attachment(s)

On Tue, Jun 5, 2018 at 7:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, May 26, 2018 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, May 18, 2018 at 11:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Regarding to API design, should we use 2PC for a distributed
transaction if both two or more 2PC-capable foreign servers and
2PC-non-capable foreign server are involved with it? Or should we end
up with an error? the 2PC-non-capable server might be either that has
2PC functionality but just disables it or that doesn't have it.

It seems to me that this is functionality that many people will not
want to use. First, doing a PREPARE and then a COMMIT for each FDW
write transaction is bound to be more expensive than just doing a
COMMIT. Second, because the default value of
max_prepared_transactions is 0, this can only work at all if special
configuration has been done on the remote side. Because of the second
point in particular, it seems to me that the default for this new
feature must be "off". It would make to ship a default configuration
of PostgreSQL that doesn't work with the default configuration of
postgres_fdw, and I do not think we want to change the default value
of max_prepared_transactions. It was changed from 5 to 0 a number of
years back for good reason.

I'm not sure that many people will not want to use this feature
because it seems to me that there are many people who don't want to
use the database that is missing transaction atomicity. But I agree
that this feature should not be enabled by default as we disable 2PC
by default.

So, I think the question could be broadened a bit: how you enable this
feature if you want it, and what happens if you want it but it's not
available for your choice of FDW? One possible enabling method is a
GUC (e.g. foreign_twophase_commit). It could be true/false, with true
meaning use PREPARE for all FDW writes and fail if that's not
supported, or it could be three-valued, like require/prefer/disable,
with require throwing an error if PREPARE support is not available and
prefer using PREPARE where available but without failing when it isn't
available. Another possibility could be to make it an FDW option,
possibly capable of being set at multiple levels (e.g. server or
foreign table). If any FDW involved in the transaction demands
distributed 2PC semantics then the whole transaction must have those
semantics or it fails. I was previous leaning toward the latter
approach, but I guess now the former approach is sounding better. I'm
not totally certain I know what's best here.

I agree that the former is better. That way, we also can control that
parameter at transaction level. If we allow the 'prefer' behavior we
need to manage not only 2PC-capable foreign server but also
2PC-non-capable foreign server. It requires all FDW to call the
registration function. So I think two-values parameter would be
better.

BTW, sorry for late submitting the updated patch. I'll post the
updated patch in this week but I'd like to share the new APIs design
beforehand.

Attached updated patches.

I've changed the new APIs to 5 functions and 1 registration function
because the rollback API can be called by both backend process and
resolver process which is not good design. The latest version patches
incorporated all comments I got except for documentation about overall
point to user. I'm considering what contents I should document it
there. I'll write it during the code patch is getting reviewed. The
basic design of new patches is almost same as the previous mail I
sent.

I introduced 5 new FDW APIs: PrepareForeignTransaction,
CommitForeignTransaction, RollbackForeignTransaction,
ResolveForeignTransaction and IsTwophaseCommitEnabled.
ResolveForeignTransaction is normally called by resolver process
whereas other four functions are called by backend process. Also I
introduced a registration function FdwXactRegisterForeignTransaction.
FDW that wish to support atomic commit requires to call this function
when a transaction opens on the foreign server. Registered foreign
transactions are controlled by the foreign transaction manager of
Postgres core and calls APIs at appropriate timing. It means that the
foreign transaction manager controls only foreign servers that are
capable of 2PC. For 2PC-non-capable foreign server, FDW must use
XactCallback to control the foreign transaction. 2PC is used at commit
when the distributed transaction modified data on two or more servers
including local server and user requested by foreign_twophase_commit
GUC parameter. All foreign transactions are prepared during pre-commit
and then commit locally. After committed locally wait for resolver
process to resolve all prepared foreign transactions. The waiting
backend is released (that is, returns the prompt to client) either
when all foreign transactions are resolved or when user requested to
waiting. If 2PC is not required, a foreign transaction is committed
during pre-commit phase of local transaction. IsTwophaseCommitEnabled
is called whenever the transaction begins to modify data on foreign
server. This is required to track whether the transaction modified
data on the foreign server that doesn't support or enable 2PC.

Atomic commit among multiple foreign servers is crash-safe. If the
coordinator server crashes during atomic commit, the foreign
transaction participants and their status are recovered during WAL
apply. Recovered foreign transactions are in doubt-state, aka dangling
transactions. If database has such transactions resolver process
periodically tries to resolve them.

I'll register this patch to next CF. Feedback is very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

0001-Keep-track-of-writing-on-non-temporary-relation_v16.patchapplication/octet-stream; name=0001-Keep-track-of-writing-on-non-temporary-relation_v16.patchDownload
From 176826d5bc0d194005e6929fc6fcf039c4367cf9 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 8 Feb 2018 11:26:46 +0900
Subject: [PATCH 1/4] Keep track of writing on non-temporary relation.

---
 src/backend/access/heap/heapam.c |   12 ++++++++++++
 src/include/access/xact.h        |    5 +++++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 72395a5..959a331 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2611,6 +2611,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleGetOid(tup);
 }
 
@@ -3440,6 +3444,10 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4390,6 +4398,10 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 083e879..c7b4144 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -98,6 +98,11 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
-- 
1.7.1

0002-Support-atomic-commit-among-multiple-foreign-servers_v16.patchapplication/octet-stream; name=0002-Support-atomic-commit-among-multiple-foreign-servers_v16.patchDownload
From 411e049046427d3a0999cee94200645002468d22 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |   97 +
 doc/src/sgml/config.sgml                      |  124 ++
 doc/src/sgml/fdwhandler.sgml                  |  200 ++
 doc/src/sgml/func.sgml                        |   51 +
 doc/src/sgml/monitoring.sgml                  |   56 +
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   65 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   32 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/executor/execPartition.c          |    4 +
 src/backend/executor/nodeForeignscan.c        |    8 +
 src/backend/executor/nodeModifyTable.c        |    5 +
 src/backend/foreign/Makefile                  |    2 +-
 src/backend/foreign/fdwxact.c                 | 2762 +++++++++++++++++++++++++
 src/backend/foreign/fdwxact_launcher.c        |  587 ++++++
 src/backend/foreign/fdwxact_resolver.c        |  310 +++
 src/backend/foreign/foreign.c                 |   43 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    5 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   61 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/foreign/fdwapi.h                  |   18 +-
 src/include/foreign/fdwxact.h                 |  147 ++
 src/include/foreign/fdwxact_launcher.h        |   31 +
 src/include/foreign/fdwxact_resolver.h        |   23 +
 src/include/foreign/fdwxact_xlog.h            |   51 +
 src/include/foreign/foreign.h                 |    2 +-
 src/include/foreign/resolver_internal.h       |   65 +
 src/include/pgstat.h                          |    8 +-
 src/include/storage/proc.h                    |   10 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |   12 +
 57 files changed, 5052 insertions(+), 27 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100755 src/backend/foreign/fdwxact.c
 create mode 100644 src/backend/foreign/fdwxact_launcher.c
 create mode 100644 src/backend/foreign/fdwxact_resolver.c
 create mode 100644 src/include/foreign/fdwxact.h
 create mode 100644 src/include/foreign/fdwxact_launcher.h
 create mode 100644 src/include/foreign/fdwxact_resolver.h
 create mode 100644 src/include/foreign/fdwxact_xlog.h
 create mode 100644 src/include/foreign/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3ed9021..c8edd7e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9625,6 +9625,103 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-prepared-fdw-xacts">
+  <title><structname>pg_prepared_fdw_xacts</structname></title>
+
+  <indexterm zone="view-pg-prepared-fdw-xacts">
+   <primary>pg_prepared_fdw_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_prepared_fdw_xacts</structname> displays
+   information about foreign transactions that are currently prepared on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="fdw-transaction-managements"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_prepared_xacts</structname> contains one row per prepared
+   foreign transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_prepared_fdw_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>transaction</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Transaction id that this foreign transaction associates with
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server that this foreign server is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction: <literal>prepared</literal>, <literal>committing</literal>, <literal>aborting</literal> or <literal>unknown</literal>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b60240e..28a1d26 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1546,6 +1546,29 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+      <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the maximum number of foreign transactions that can be prepared
+        simultaneously. A single local transaction can give rise to multiple
+        foreign transaction. If <literal>N</literal> local transactions each
+        across <literal>K</literal> foreign server this value need to be set
+        <literal>N * K</literal>, not just <literal>N</literal>.
+        This parameter can only be set at server start.
+       </para>
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-work-mem" xreflabel="work_mem">
       <term><varname>work_mem</varname> (<type>integer</type>)
       <indexterm>
@@ -3651,6 +3674,78 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
      </variablelist>
     </sect2>
 
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+
+     <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+      <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+      <indexterm>
+       <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+        resolver is responsible for foreign transaction resolution on one database.
+       </para>
+       <para>
+        Foreign transaction resolution workers are taken from the pool defined by
+        <varname>max_worker_processes</varname>.
+       </para>
+       <para>
+        The default value is 0.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+      <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specify how long the foreign transaction resolver should wait when the last resolution
+        fails before retrying to resolve foreign transaction. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 10 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+      <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Terminate foreign transaction resolver processes that don't have any foreign
+        transactions to resolve longer than the specified number of milliseconds.
+        A value of zero disables the timeout mechanism.  You should set this value to
+        zero only if you set <varname>max_foreign_transaction_resolvers</varname> as
+        much as databases you have. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 60 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     </variablelist>
+    </sect2>
+
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -7796,6 +7891,35 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-foreign-transaction">
+    <title>Foreign Transaction Management</title>
+
+    <variablelist>
+
+     <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophase_commit">
+      <term><varname>foreign_twophase_commit</varname> (<type>bool</type>)
+       <indexterm>
+        <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+       </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether transaction commit will wait for all involving foreign transaction
+        to be resolved before the command returns a "success" indication to the client.
+        Both <varname>max_prepared_foreign_transactions</varname> and
+        <varname>max_foreign_transaction_resolvers</varname> must be non-zero value to
+        allow foreign twophase commit to be used.
+       </para>
+       <para>
+        This parameter can be changed at any time; the behavior for any one transaction
+        is determined by the setting in effect when it commits.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 7b758bd..71564b3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1386,6 +1386,109 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>atomic commit</firstterm>
+     (as described in <xref linkend="fdw-transaction-managements"/>), it must call the
+     registrasaction function <function>FdwXactRegisterForeignTransaction</function>
+     and provide the following callback functions:
+    </para>
+
+    <para>
+<programlisting>
+bool
+PrepareForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Prepare a foreign transaction identified by <varname>foreign_xact</varname>.
+    This function is called at the pre-commit phase of the local
+    transaction if atomic commit is
+    required. Returning <literal>true</literal> means that preparing
+    the foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Commit a not-prepared foreign transaction identified by
+    <varname>foreign_xact</varname>.
+    This function is called at the pre-commit phase of local
+    transaction if atomic commit is not required. The atomic
+    commit is not required either when we modified data on
+    only one server including local server or when user doesn't
+    request atomic commit by <xref linkend="guc-foreign-twophase-commit"/>.
+    Returning <literal>true</literal> means that commit the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Rollback a not-prepared foreign transaction identified by
+    <varname>foreign_xact</varname>.
+    This function is called at the end of local transaction after
+    rollbacked locally either when user requested rollback or when
+    any error occurs within the transaction. This function could
+    be called recursively if any error occurs during rollback the
+    foreign transaction for whatever reason. You need to track
+    recursion and prevent this function from being called infinitely.
+    Returning <literal>true</literal> means that rollback the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+ResolvePreparedForeignTransaction(ForeignTransaction *foreign_xact,
+                                  bool is_commit);
+</programlisting>
+    Commit or rollback the prepared foreign transaction identified
+    by <varname>foreign_xact</varname>. on a connection to foreign server
+    When <varname>is_commit</varname> is true, it indicate that the foreign
+    transaction should be committed.
+    This function normally is called by the foreign transaction resolver
+    process but can also be called by <function>pg_resovle_fdw_xacts</function>
+    function. In the resolver process, this function is called either
+    when a backend requests the resolver process to resolve a distributed
+    transaction after prepared or when a database has dangling
+    transaction. Returning <literal>true</literal> means that resolving
+    the foreign transaction got successful.
+    In abort case, please note that the prepared foreign transaction
+    having identifier <varname>foreign__xact->fx_id</varname> might not
+    exist on the foreign server. If you failed to resolve the foreign
+    transaction due to undefined object error
+    (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) you should regards
+    it as success and return <literal>true</literal>.
+    </para>
+    <para>
+<programlisting>
+bool
+IsTwoPhaseCommitEnabled(Oid serverid);
+</programlisting>
+    Return <literal>true</literal> if foreign server identified by
+    <literal>serverid</literal> is capable of two-phase commit protocol.
+    This function is called when the transaction begins to modify data on
+    the foreign server. Return <literal>false</literal> indicates that
+    the current transaction cannot use atomic commit even if atomic commit
+    is requested by user.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>. To get informations of FDW-related
+      objects, you can use given a <literal>ForeignTransaction</literal>
+      instead (see <filename>foreign/fdwxact.h</filename> for details).
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1831,4 +1934,101 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+    <title>Transaction managements for Foreign Data Wrappers</title>
+
+    <sect2 id="fdw-transaction-atomic-commit">
+     <title>Atomic commit among multiple foreign servers</title>
+
+     <para>
+      <productname>PostgreSQL</productname> foreign transaction manager
+      allows FDWs to read and write data on foreign server within a transaction while
+      maintaining atomicity of the foreign data (aka atomic commit). Using
+      atomic commit, it guarantees that a distributed transaction is committed
+      or rollbacked on all participants foreign
+      server.  To achieve atomic commit, <productname>PostgreSQL</productname>
+      employees two-phase commit protocol, which is a type of atomic commitment
+      protocol. Every FDW that wish to support atomic commit
+      is required to support transaction management callback routines
+      (see <xref linkend="fdw-callbacks-transaction-managements"/> for details)
+      and register the foreign transaction using
+      <function>FdwXactRegisterForeignTransaction</function> when starting a
+      transaction on the foreign server. Transaction of registered foreign server
+      is managed by the foreign transaction manager.
+<programlisting>
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id)
+</programlisting>
+    This function should be called when a transaction starts on the foreign server.
+    <varname>serverid</varname> and <varname>userid</varname> are <type>OID</type>s
+    which specify the transaction starts on what server by who. <varname>fx_id</varname>
+    is null-terminated string which is an identifer of foreign transaction and it
+    will be passed when transaction management APIs is called. The length of
+    <varname>fx_id</varname> must be less than 200 bytes. Also this identifier
+    must be unique enough so that it doesn't conflict other concurrent foreign
+    transactions. <varname>fx_id</varname> can be <literal>NULL</literal>.
+    If it's <literal>NULL</literal>, a transaction identifier is automacitally
+    generated with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    Since this identifier is used per foreign transaction and the xid of unresolved
+    distributed transaction never reused, an auto-generated identifier is fairly
+    enough to ensure uniqueness. It's recommended to generate foreign transaction
+    identifier in FDW if the format of auto-generated identifier doesn't match
+    the requirement of the foreign server.
+    </para>
+
+     <para>
+      An example of such transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+     </para>
+
+     <para>
+      When a transaction starts on the foreign server, FDW that wishes atomic
+      commit must register the foreign transaction as a participant by calling
+      <function>FdwXactRegisterForeignTransaction</function>. Also during
+      transaction, <function>IsTwoPhaseCommitEnabled</function> is called whenever
+      the transaction begins to modify data on the foreign server. If FDW wishes
+      atomic commit <function>IsTwoPhaseCommitEnabled</function> must return
+      <literal>true</literal>. All foreign transaction participants must
+      return <literal>true</literal> to achieve atomic commit.
+     </para>
+
+     <para>
+      During pre-commit phase of local transaction, the foreign transaction manager
+      persists the foreign transaction information to the disk and WAL, and then
+      prepare all foreign transaction by calling <function>PrepareForeignTransaction</function>
+      if two-phase commit protocol is required. Two-phase commit is required only if
+      the transaction modified data on more than one servers including the local
+      server and user requests atomic commit. <productname>PostgreSQL</productname>
+      can commit locally and go to the next step if and only if all preparing foreign
+      transactions got successful. If two-phase commit is not required, the foreign
+      transaction manager commits a transaction on the foreign server by calling
+      <function>CommitForeignTransaction</function> and then
+      <productname>PostgreSQL</productname> commits locally. The foreign transaction
+      manager doesn't do any further change on foreign transactions from this point
+      forward. If any failure happens for whatever reason, for example a network
+      failure or user request until <productname>PostgreSQL</productname> commits
+      locally the foreign transaction manager changes over to rollback and calls
+      <function>RollbackForeignTransaction</function> for every foreign servers to
+      close the current transaction on foreign servers.
+     </para>
+
+     <para>
+      When two-phase commit is required, after committed locally, each the transaction
+      commits will wait for all prepared foreign transaction to be resolved before
+      the commit completes. The foreign transaction resolver is responsible for
+      foreign transaction resolution. <function>ResolverForeignTransaction</function>
+      is called by the foreign transaction resolver process when it resolves a foreign
+      transactions. <function>ResolveForeignTransaction</function> is also be called
+      when user execute <function>pg_resovle_fdw_xact</function> function.
+     </para>
+    </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b851fe0..f080375 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20539,6 +20539,57 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-fdw-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_fdw_xacts</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_fdw_xacts</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function search for foreign transaction
+        matching the arguments and resolves then. This function won't resolve
+        a foreign transaction which is in progress, or one that is locked by some
+        other backend.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_fdw_xact</function>
+        except it remove foreign transaction entry without resolving.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c2adb22..cf9b236 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_fdw_xact_resolver</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-resolver-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1194,6 +1202,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalLauncherMain</literal></entry>
          <entry>Waiting in main loop of logical launcher process.</entry>
         </row>
@@ -1405,6 +1421,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2210,6 +2230,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-resolver-view" xreflabel="pg_stat_fdw_xact_resolver">
+   <title><structname>pg_stat_fdw_xact_resolver</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..3705104
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdw_xactdesc.c
+ *		PostgreSQL distributed transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/backend/access/transam/fdw_xactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "foreign/fdwxact_xlog.h"
+
+void
+fdw_xact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		FdwXactOnDiskData *fdw_insert_xlog = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_insert_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_insert_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_insert_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_insert_xlog->local_xid);
+		/* TODO: This should be really interpreted by each FDW */
+
+		/*
+		 * TODO: we also need to assess whether we want to add this
+		 * information
+		 */
+		appendStringInfo(buf, " foreign transaction info: %s",
+						 fdw_insert_xlog->fdw_xact_id);
+	}
+	else
+	{
+		xl_fdw_xact_remove *fdw_remove_xlog = (xl_fdw_xact_remove *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_remove_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_remove_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_remove_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_remove_xlog->xid);
+	}
+
+}
+
+const char *
+fdw_xact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDW_XACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDW_XACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7..023a7c5 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,14 +112,16 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_prepared_xacts=%d max_locks_per_xact=%d "
 						 "wal_level=%s wal_log_hints=%s "
-						 "track_commit_timestamp=%s",
+						 "track_commit_timestamp=%s "
+						 "max_prepared_foreign_xacts=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..b5c3502 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,6 +24,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact.h"
 #include "replication/message.h"
 #include "replication/origin.h"
 #include "storage/standby.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 65194db..5389929 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -89,6 +89,7 @@
 #include "access/xlogreader.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -843,6 +844,35 @@ TwoPhaseGetGXact(TransactionId xid)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyProc
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2313,6 +2343,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2372,6 +2408,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index f4e5ea8..cedd359 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "foreign/fdwxact.h"
 #include "libpq/be-fsstubs.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -1127,6 +1128,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_twophase_for_ac;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1135,6 +1137,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_twophase_for_ac = ForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1173,12 +1176,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_twophase_for_ac)
 			goto cleanup;
 	}
 	else
@@ -1336,6 +1340,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_twophase_for_ac && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -1974,6 +1986,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2129,6 +2144,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2216,6 +2232,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2404,6 +2422,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2608,6 +2627,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2..ca65cd7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -5187,6 +5188,7 @@ BootStrapXLOG(void)
 	ControlFile->MaxConnections = MaxConnections;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6274,6 +6276,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6777,14 +6782,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdw_xact, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6959,7 +6965,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7584,6 +7593,7 @@ StartupXLOG(void)
 
 	/* Pre-scan prepared transactions to find out the range of XIDs present */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
@@ -7770,6 +7780,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9075,6 +9088,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9511,7 +9525,8 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9543,6 +9558,7 @@ XLogReportParameters(void)
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9740,6 +9756,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9938,6 +9955,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->MaxConnections = xlrec.MaxConnections;
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8cd8bf4..4ea02dd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_prepared_fdw_xacts AS
+       SELECT * FROM pg_prepared_fdw_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
 	l.objoid, l.classoid, l.objsubid,
@@ -773,6 +776,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_fdwxact_resolvers AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_fdwxact_resolver() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index 5c53aee..3a6dff5 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -28,6 +28,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
@@ -1093,6 +1094,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1403,6 +1416,16 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srv->serverid,
+						useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0a003d9..6f726f5 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -19,6 +19,7 @@
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -693,7 +694,10 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+		FdwXactMarkForeignTransactionModified(partRelInfo, 0);
+	}
 
 	MemoryContextSwitchTo(oldContext);
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index a2a28b7..30a0b66 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,9 +22,11 @@
  */
 #include "postgres.h"
 
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -224,7 +226,13 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+
+		/* Mark this transaction modified data on the foreign server */
+		FdwXactMarkForeignTransactionModified(estate->es_result_relation_info,
+										 eflags);
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a4dfea..37c0e98 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -44,6 +44,8 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -2288,6 +2290,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 fdw_private,
 															 i,
 															 eflags);
+
+			/* Mark this transaction modified data on the foreign server */
+			FdwXactMarkForeignTransactionModified(resultRelInfo, eflags);
 		}
 
 		resultRelInfo++;
diff --git a/src/backend/foreign/Makefile b/src/backend/foreign/Makefile
index 85aa857..4329d3e 100644
--- a/src/backend/foreign/Makefile
+++ b/src/backend/foreign/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/foreign
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS= foreign.o
+OBJS= foreign.o fdwxact.o fdwxact_launcher.o fdwxact_resolver.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/foreign/fdwxact.c b/src/backend/foreign/fdwxact.c
new file mode 100755
index 0000000..d284861
--- /dev/null
+++ b/src/backend/foreign/fdwxact.c
@@ -0,0 +1,2762 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL distributed transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * When a foreign data wrapper starts transaction on a foreign server
+ * that is capable of two-phase commit protocol, it's required to register
+ * the foreign transaction using function FdwXactRegisterTransaction() in order
+ * to participate to a group for atomic commit. Participants are identified
+ * by oid of foreign server and user. When the foreign transaction begins
+ * to modify data it's required to mark it as modified using
+ * FdwXactMarkForeignTransactionModified()
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. After committing or rolling back locally, we
+ * notify the resolver process and tell it to commit or roll back those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish,
+ * and so that we're not trying to locally do work that might fail when an
+ * ERROR after already committed.
+ *
+ * Two-phase commit protocol is required if the transaction modified
+ * two or more servers including itself. In other case, all foreign transactions
+ * are committed during pre-commit.
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. dangling
+ * transaction). Dangling transactions are processed by the resolve process
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * 	* On PREPARE redo we add the foreign transaction to FdwXactCtl->fdw_xacts.
+ *	  We set fdw_xact->inredo to true for such entries.
+ *	* On Checkpoint redo, we iterate through FdwXactCtl->fdw_xacts entries that
+ *	  have set fdw_xact->inredo true and are behind the redo_horizon. We save
+ *    them to disk and then set fdw_xact->ondisk to true.
+ *	* On COMMIT and ABORT we delete the entry from FdwXactCtl->fdw_xacts.
+ *	  If fdw_xact->ondisk is true, we delete the corresponding file from
+ *	  the disk as well.
+ *  * RecoverFdwXacts loads all foreign transaction entries from disk into
+ *    memory at server startup.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/foreign/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/fdwxact_xlog.h"
+#include "foreign/resolver_internal.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Is atomic commit requested by user? */
+#define AtomicCommitRequested() \
+	(foreign_twophase_commit == true && \
+	 max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Structure to bundle the foreign transaction participant */
+typedef struct FdwXactParticipant
+{
+	Oid			serverid;
+	Oid			userid;
+
+	/*
+	 * Pointer to a FdwXact entry in global entry. NULL if
+	 * this foreign transaction is registered but not inserted
+	 * yet.
+	 */
+	FdwXact		fdw_xact;
+	char		*fdw_xact_id;
+
+	/* true if this transaction modified data on the foreign server */
+	bool		modified;
+
+	/*
+	 * This is initialized at foreign transaction registration and
+	 * passed to API functions.
+	 */
+	ForeignTransaction foreign_xact;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact;
+	CommitForeignTransaction_function	commit_foreign_xact;
+	RollbackForeignTransaction_function	rollback_foreign_xact;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit.
+ * This list has only foreign servers that are capable of two-phase
+ * commit protocol.
+ */
+List *FdwXactParticipantsForAC = NIL;
+
+/*
+ * This struct tracks all participants involved with transaction 'xid'.
+ */
+typedef struct FdwXactStateCacheEntry
+{
+	/* Key -- must be first */
+	TransactionId	xid;
+
+	/* List of FdwXacts involved with the xid */
+	FdwXact	participants;
+} FdwXactStateCacheEntry;
+static HTAB	*FdwXactStateCache;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDW_XACTS_DIR "pg_fdw_xact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDW_XACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDW_XACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+static FdwXact FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static bool FdwXactResolveForeignTransaction(FdwXact fdw_xact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactQueueInsert(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid, bool give_warnings);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool give_warnings);
+static List *get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						   bool need_lock);
+static FdwXact get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								bool need_lock);
+static FdwXact get_all_fdw_xacts(int *length);
+static FdwXact insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							   char *fdw_xact_id);
+static char *generate_fdw_xact_identifier(Oid serverid, Oid userid);
+static void remove_fdw_xact(FdwXact fdw_xact);
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+bool foreign_twophase_commit = false;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * This function aimes to be called by FDW when foreign transaction
+ * starts. The foreign server identified by given server id must
+ * support atomic commit APIs. The foreign transaction is identified
+ * by given identifier 'fdw_xact_id' which can be NULL. If it's NULL,
+ * we construct an unique identifer.
+ *
+ * After registered, foreign transaction of participants are managed
+ * by foreign transaction manager until the end of the distributed
+ * transaction.
+ */
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id)
+{
+	FdwXactParticipant	*fdw_part;
+	ListCell   			*lc;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	MemoryContext		old_context;
+
+	/* Check length of foreign transaction identifier */
+	if (fx_id != NULL && strlen(fx_id) >= NAMEDATALEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						fx_id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   NAMEDATALEN)));
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_xact_resolvers to a nonzero value.")));
+
+	/* Duplication check */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		/* Quick return if there is already registered connection */
+		if (fdw_part->serverid == serverid && fdw_part->userid == userid)
+			ereport(ERROR,
+					(errmsg("attempt to start transction again on server %u user %u",
+							serverid, userid)));
+	}
+
+	/*
+	 * Participants information is needed at the end of a transaction, when
+	 * system cache are not available. so save it in TopTransactionContext
+	 * before hand so that these can live until the end of transaction.
+	 */
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	/* Make sure that the FDW has transaction handlers */
+	if (!fdw_routine->PrepareForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function provided for preparing foreign transaction for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to commit a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->RollbackForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to rollback a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+
+	/* Generate foreign transaction identifier if not provided */
+	if (fx_id ==  NULL)
+		fx_id = generate_fdw_xact_identifier(serverid, userid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->serverid = serverid;
+	fdw_part->userid = userid;
+	fdw_part->fdw_xact_id = fx_id;
+	fdw_part->fdw_xact = NULL;
+	fdw_part->modified = false;	/* by default */
+	fdw_part->foreign_xact.server = foreign_server;
+	fdw_part->foreign_xact.usermapping = user_mapping;
+	fdw_part->foreign_xact.fx_id = fx_id;
+	fdw_part->prepare_foreign_xact = fdw_routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact = fdw_routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact = fdw_routine->RollbackForeignTransaction;
+
+	/* Add this foreign connection to the participants list */
+	FdwXactParticipantsForAC = lappend(FdwXactParticipantsForAC, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_context);
+
+	return;
+}
+
+/*
+ * Remember the registered foreign transaction modified data . This function
+ * is called when the executor begins to modify data on a foreign server
+ * regardless the foreign server is capable of two-phase commit protocol.
+ * Marking it will be used to determine we must use two-phase commit protocol
+ * at commit. This function also checks if the begin modified foreign server
+ * is capable of two-phase commit or not. If it doesn't support, we remember
+ * it.
+ */
+void
+FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo, int flags)
+{
+	Relation			rel = resultRelInfo->ri_RelationDesc;
+	FdwXactParticipant	*fdw_part;
+	ForeignTable		*ftable;
+	ListCell   			*lc;
+	Oid					userid;
+	Oid					serverid;
+
+	bool found = false;
+
+	/* Quick return if user not request */
+	if (!AtomicCommitRequested())
+		return;
+
+	/* Do nothing in EXPLAIN (no ANALYZE) case */
+	if (flags && EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	ftable = GetForeignTable(RelationGetRelid(rel));
+
+	/*
+	 * If the being modified foreign server doesn't or cannot enable
+	 * two-phase commit protocol, mark that we've written such server
+	 * and return.
+	 */
+	if (resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled == NULL ||
+		!resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled(ftable->serverid))
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		return;
+	}
+
+	/*
+	 * The foreign server being modified supports two-phase commit protocol,
+	 * remember that the foreign transaction modified data.
+	 */
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	serverid = ftable->serverid;
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		if (fdw_part->serverid == serverid && fdw_part->userid == userid)
+		{
+			fdw_part->modified = true;
+			found = true;
+			break;
+		}
+	}
+
+	if (!found)
+		elog(ERROR, "attempt to mark unregistered foreign server %u, user %u as modified",
+			 serverid, userid);
+}
+
+/*
+ * FdwXactShmemSize
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdw_xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	/* Size for shared cache entry */
+	size = MAXALIGN(size);
+	size = add_size(size, hash_estimate_size(max_prepared_foreign_xacts,
+											 sizeof(FdwXactStateCacheEntry)));
+
+	return size;
+}
+
+/*
+ * FdwXactShmemInit
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of
+ * FdwXactCtlData structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdw_xacts;
+		HASHCTL		info;
+		long		max_hash_size;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->freeFdwXacts = NULL;
+		FdwXactCtl->numFdwXacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdw_xacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdw_xacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdw_xacts[cnt].status = FDW_XACT_INITIAL;
+			fdw_xacts[cnt].fxact_free_next = FdwXactCtl->freeFdwXacts;
+			FdwXactCtl->freeFdwXacts = &fdw_xacts[cnt];
+		}
+
+		/* Initialize shared state cache hash table */
+		MemSet(&info, 0, sizeof(info));
+		info.keysize = sizeof(TransactionId);
+		info.entrysize = sizeof(FdwXactStateCacheEntry);
+		max_hash_size = max_prepared_foreign_xacts;
+
+		FdwXactStateCache = ShmemInitHash("FdwXact hash",
+										  max_hash_size,
+										  max_hash_size,
+										  &info,
+										  HASH_ELEM | HASH_BLOBS);
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * PreCommit_FdwXacts
+ *
+ * This function prepares all foreign transaction participants if atomic commit
+ * is required. Otherwise commits them without preparing.
+ *
+ * If atomic commit is requested by user (that is, foreign_twophase_commit is on),
+ * every participants must enable two-phase commit. If we manage all foreign
+ * transactions involving with a transaction we can commit foreign transactions
+ * on foreign server that doesn't use two-phase commit here and commit others
+ * at post-commit phase, but we don't do that. Because (1) it doesn't satisfy
+ * the atomic commit semantics at all and (2) it requires all FDWs to register
+ * foreign server anyway, which breaks backward compatibility.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * If user requires the atomic commit semantics, we don't allow COMMIT if we've
+	 * modified data on  foreign servers both that can execute two-phase commit
+	 * protocol and that cannot.
+	 */
+	if (foreign_twophase_commit == true && MyXactFlags & XACT_FLAGS_FDWNOPREPARE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	if (ForeignTwophaseCommitRequired())
+	{
+		/* Prepare the transactions on the all foreign servers */
+		FdwXactPrepareForeignTransactions();
+	}
+	else
+	{
+		ListCell   *lc;
+
+		Assert(list_length(FdwXactParticipantsForAC) == 1);
+
+		/* Two-phase commit is not required, commit them one by one */
+		foreach(lc, FdwXactParticipantsForAC)
+		{
+			FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			/* Commit foreign transaction */
+			if (!fdw_part->commit_foreign_xact(&fdw_part->foreign_xact))
+				ereport(ERROR,
+						(errmsg("could not commit foreign transaction on server %s",
+								fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* Forget all participants */
+		FdwXactParticipantsForAC = NIL;
+	}
+}
+
+/*
+ * FdwXactPrepareForeignTransactions
+ *
+ * Prepare all foreign transaction participants.  This function creates a prepared
+ * participants chain whenever we prepared a foreign transaction. The prepared
+ * participants chain is used to access all participants of distributed transaction
+ * quickly. If any one of them fails to prepare or raises an error, we change over
+ * to aborts.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell   *lcell;
+	FdwXact		prev_fxact = NULL;
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXact		fxact;
+
+		/*
+		 * Register the foreign transaction entry. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fxact = FdwXactRegisterFdwXactEntry(GetTopTransactionId(), fdw_part);
+
+		/*
+		 * Between FdwXactRegisterFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal). During abort processing,
+		 * we might try to resolve a never-prepared transaction, and get an error.
+		 * This is fine as long as the FDW provides us unique prepared transaction
+		 * identifiers.
+		 */
+		if (!fdw_part->prepare_foreign_xact(&fdw_part->foreign_xact))
+		{
+			/* Failed to prepare, change over aborts */
+			ereport(ERROR,
+					(errmsg("could not prepare transaction on foreign server %s",
+							fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* Preparation is success, update its status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdw_part->fdw_xact->status = FDW_XACT_PREPARED;
+		fdw_part->fdw_xact = fxact;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Create a prepared participants chain, which is link-ed FdwXact entries
+		 * involving with this transaction. The head entry is remembered in hash
+		 * table and subsequent entries is liked from the previous entry.
+		 */
+		if (!prev_fxact)
+		{
+			FdwXactStateCacheEntry	*fxact_entry;
+			bool				found;
+
+			LWLockAcquire(FdwXactLock,LW_EXCLUSIVE);
+			fxact_entry = (FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+																 (void *) &(fxact->local_xid),
+																 HASH_ENTER, &found);
+			LWLockRelease(FdwXactLock);
+			Assert(!found);
+
+			/* Set the first participant */
+			fxact_entry->participants = fxact;
+		}
+		else
+		{
+			/* Append others to the tail */
+			Assert(fxact->fxact_next == NULL);
+			prev_fxact->fxact_next = fxact;
+		}
+
+		prev_fxact = fxact;
+	}
+}
+
+/*
+ * FdwXactRegisterFdwXactEntry
+ *
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and will
+ * be persisted to the disk under pg_fdw_xact directory when checkpoint.
+ */
+static FdwXact
+FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fxact;
+	FdwXactOnDiskData	*fxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact = insert_fdw_xact(MyDatabaseId, xid, fdw_part->serverid,
+							fdw_part->userid, fdw_part->fdw_xact_id);
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->registered_backend = MyBackendId;
+	fdw_part->fdw_xact = fxact;
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdw_xact_id);
+	data_len = data_len + strlen(fdw_part->fdw_xact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fxact_file_data->dbid = MyDatabaseId;
+	fxact_file_data->local_xid = xid;
+	fxact_file_data->serverid = fdw_part->serverid;
+	fxact_file_data->userid = fdw_part->userid;
+	memcpy(fxact_file_data->fdw_xact_id, fdw_part->fdw_xact_id,
+		   strlen(fdw_part->fdw_xact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fxact_file_data, data_len);
+	fxact->insert_end_lsn = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_INSERT);
+	XLogFlush(fxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact->valid = true;
+	LWLockRelease(FdwXactLock);
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fxact_file_data);
+	return fxact;
+}
+
+/*
+ * insert_fdw_xact
+ *
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				char *fdw_xact_id)
+{
+	int i;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		fxact = FdwXactCtl->fdw_xacts[i];
+		if (fxact->dbid == dbid &&
+			fxact->local_xid == xid &&
+			fxact->serverid == serverid &&
+			fxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->freeFdwXacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fxact = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fxact->fxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->numFdwXacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts++] = fxact;
+
+	fxact->registered_backend = InvalidBackendId;
+	fxact->dbid = dbid;
+	fxact->local_xid = xid;
+	fxact->serverid = serverid;
+	fxact->userid = userid;
+	fxact->insert_start_lsn = InvalidXLogRecPtr;
+	fxact->insert_end_lsn = InvalidXLogRecPtr;
+	fxact->valid = false;
+	fxact->ondisk = false;
+	fxact->inredo = false;
+	memcpy(fxact->fdw_xact_id, fdw_xact_id, strlen(fdw_xact_id) + 1);
+
+	return fxact;
+}
+
+/*
+ * remove_fdw_xact
+ *
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdw_xact(FdwXact fdw_xact)
+{
+	int			cnt;
+
+	Assert(fdw_xact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		if (FdwXactCtl->fdw_xacts[cnt] == fdw_xact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (cnt >= FdwXactCtl->numFdwXacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdw_xact->local_xid, fdw_xact->serverid, fdw_xact->userid)));
+
+	/* Remove the entry from active array */
+	FdwXactCtl->numFdwXacts--;
+	FdwXactCtl->fdw_xacts[cnt] = FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts];
+
+	/* Put it back into free list */
+	fdw_xact->fxact_free_next = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fdw_xact;
+
+	/* Reset informations */
+	fdw_xact->status = FDW_XACT_INITIAL;
+	fdw_xact->registered_backend = InvalidBackendId;
+	fdw_xact->fxact_next = NULL;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdw_xact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdw_xact->serverid;
+		record.dbid = fdw_xact->dbid;
+		record.xid = fdw_xact->local_xid;
+		record.userid = fdw_xact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdw_xact_remove));
+		recptr = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true if the current transaction requires foreign two-phase commit
+ * to achieve atomic commit. Foreign two-phase commit is required if we
+ * satisfy either case: we modified data on two or more foreign server, or
+ * we modified both non-temporary relation on local and data on more than
+ * one foreign server.
+ */
+bool
+ForeignTwophaseCommitRequired(void)
+{
+	int	nserverswritten = list_length(FdwXactParticipantsForAC);
+	ListCell*	lc;
+	bool		modified = false;
+
+	/* Return if not requested */
+	if (!AtomicCommitRequested())
+		return false;
+
+	/* Check if we modified data on any foreign server */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+		{
+			modified = true;
+			break;
+		}
+	}
+
+	/* We didn't modify data on any foreign server */
+	if (!modified)
+		return false;
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	return nserverswritten > 1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * ForgetAllFdwXactParticipants
+ *
+ * Reset all the foreign transaction entries that this backend registered.
+ * If the foreign transaction has the corresponding FdwXact entry, resetting
+ * the registered_backend field means to leave that entry in unresolved state.
+ * If we leaves any entries, we update the oldest xmin of unresolved transaction
+ * so that transaction status of dangling transaction are not truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_left = 0;
+
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(cell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+
+		/* Skip if didn't register FdwXact entry yet */
+		if (fdw_part->fdw_xact == NULL)
+			continue;
+
+		/*
+		 * There is a race condition; the entries of FdwXactParticipantsForAC
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget them. So we need to check if the entries are
+		 * still associated with the transaction.
+		 */
+		if (fdw_part->fdw_xact->registered_backend == MyBackendId)
+		{
+			fdw_part->fdw_xact->registered_backend = InvalidBackendId;
+			n_left++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Update the oldest local transaction of unresolved distributed
+	 * transaction if we leaved any FdwXact entries.
+	 */
+	if (n_left > 0)
+		FdwXactComputeRequiredXmin();
+
+	FdwXactParticipantsForAC = NIL;
+}
+
+/*
+ * AtProcExit_FdwXact
+ *
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDW_XACT_NOT_WAITING and then change
+ * that state to FDW_XACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransactions a fdwxact resolver changes the
+ * state to FDW_XACT_WAIT_COMPLETE once foreign transactions are resolved.
+ * This backend then resets its state to FDW_XACT_NOT_WAITING.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+	ListCell	*lc;
+	List		*fdwxact_participants = NIL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!AtomicCommitRequested())
+		return;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDW_XACT_NOT_WAITING);
+
+	if (FdwXactParticipantsForAC != NIL)
+	{
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * we've prepared just before, use the participants list.
+		 */
+		Assert(MyPgXact->xid == wait_xid);
+		fdwxact_participants = FdwXactParticipantsForAC;
+	}
+	else
+	{
+		FdwXactStateCacheEntry *fdwxact_entry;
+		bool found;
+
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * is part of a local prepared transaction that is mark as
+		 * prepared during running, since these entries exist in the hash
+		 * table we construct the participants list from the entry.
+		 */
+		Assert(FdwXactStateCache);
+		fdwxact_entry = (FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+															   (void *) &wait_xid,
+															   HASH_FIND, &found);
+
+		if (found)
+		{
+			FdwXact	fdwxact;
+
+			for (fdwxact = fdwxact_entry->participants;
+				 fdwxact != NULL;
+				 fdwxact = fdwxact->fxact_next)
+				fdwxact_participants = lappend(fdwxact_participants, fdwxact);
+		}
+	}
+
+	/*
+	 * Otherwise, construct the participants list by scanning the global
+	 * array. This can happen in the case where we restarts after PREPARE'd
+	 * a distributed transaction and then are trying to resolve it.
+	 */
+	if (fdwxact_participants == NIL)
+		fdwxact_participants = get_fdw_xacts(MyDatabaseId, wait_xid,
+											 InvalidOid, InvalidOid, true);
+
+	/* Exit if we found no foreign transaction to resolve */
+	if (fdwxact_participants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(lc, fdwxact_participants)
+	{
+		FdwXact fdw_xact = (FdwXact) lfirst(lc);
+
+		/* Don't overwrite status if fate has been determined */
+		if (fdw_xact->status == FDW_XACT_PREPARED)
+			fdw_xact->status = (is_commit ?
+								FDW_XACT_COMMITTING_PREPARED :
+								FDW_XACT_ABORTING_PREPARED);
+	}
+
+	/* Set backend status and enqueue itself */
+	MyProc->fdwXactState = FDW_XACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	FdwXactQueueInsert();
+	LWLockRelease(FdwXactLock);
+
+	/* Launch a resolver process if not yet, or wake it up */
+	fdwxact_maybe_launch_resolver(false);
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDW_XACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDW_XACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDW_XACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+
+	/*
+	 * Forget the list of locked entries, also means that the entries
+	 * that could not resolved are remained as dangling transactions.
+	 */
+	ForgetAllFdwXactParticipants();
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Acquire FdwXactLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Insert MyProc into the tail of FdwXactQueue.
+ */
+static void
+FdwXactQueueInsert(void)
+{
+	SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactQueue),
+						 &(MyProc->fdwXactLinks));
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Create and initialize an FdwXactResolveState which is used
+ * for resolution of foreign transactions.
+ */
+FdwXactResolveState *
+CreateFdwXactResolveState(void)
+{
+	FdwXactResolveState *frstate = palloc0(sizeof(FdwXactResolveState));
+
+	frstate->dbid = MyDatabaseId;
+	frstate->fdwxact = NULL;
+	frstate->waiter = NULL;
+
+	return frstate;
+}
+
+/*
+ * Resolve one distributed transaction. The target distributed transaction
+ * is fetched from shmem queue and its participants are fetched from either
+ * shmem hash table or global array. Release the waiter and return true only
+ * if we resolved the all of the foreign transaction participants. Return
+ * false if we flied to resolve any of them.
+ *
+ * To ensure the order of registered distributed transaction to the queue, we
+ * must not go the next distributed transaction until all of participants are
+ * resolved. The failed foreign transactions will be retried at the next execution.
+ */
+bool
+FdwXactResolveDistributedTransaction(FdwXactResolveState *frstate)
+{
+	FdwXactStateCacheEntry	*fdwxact_entry = NULL;
+	volatile FdwXact	fdwxacts_failed_to_resolve = NULL;
+	bool				all_resolved = false;
+
+	Assert(frstate->dbid == MyDatabaseId);
+
+	/* Get a new waiter, if not exists */
+	if (frstate->waiter == NULL)
+	{
+		PGPROC	*proc;
+
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+		/* Fetch a waiter from beginning of the queue */
+		while ((proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->FdwXactQueue),
+											   &(FdwXactRslvCtl->FdwXactQueue),
+											   offsetof(PGPROC, fdwXactLinks))) != NULL)
+		{
+			/* Found a waiter */
+			if (proc->databaseId == frstate->dbid)
+				break;
+		}
+
+		LWLockRelease(FdwXactLock);
+
+		/* If no waiter, there is no job */
+		if (!proc)
+			return false;
+
+		Assert(TransactionIdIsValid(proc->fdwXactWaitXid));
+		frstate->waiter = proc;
+	}
+
+	/* Get foreign transaction participants */
+	if (frstate->fdwxact == NULL)
+	{
+		bool found;
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+		/* Search FdwXact entries from the hash table by the local transaction id */
+		fdwxact_entry =
+			(FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+												   (void *) &(frstate->waiter->fdwXactWaitXid),
+												   HASH_FIND, &found);
+
+		if (found)
+			frstate->fdwxact = fdwxact_entry->participants;
+		else
+		{
+			int i;
+			FdwXact entries_to_resolve = NULL;
+			FdwXact prev_fx = NULL;
+
+			/*
+			 * The fdwxact entry doesn't exist in the hash table in case where
+			 * a prepared transaction is resolved after recovery. In this case,
+			 * we construct a list of fdw xact entries by scanning over the
+			 * FdwXactCtl->fdw_xacts list.
+			 */
+			for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+			{
+				FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+
+				if (fdw_xact->dbid == frstate->dbid &&
+					fdw_xact->local_xid == frstate->waiter->fdwXactWaitXid)
+				{
+					if (!entries_to_resolve)
+						entries_to_resolve = fdw_xact;
+
+					/* Link from previous entry to this entry */
+					if (prev_fx)
+						prev_fx->fxact_next = fdw_xact;
+
+					prev_fx = fdw_xact;
+				}
+			}
+
+			frstate->fdwxact = entries_to_resolve;
+		}
+
+		LWLockRelease(FdwXactLock);
+	}
+
+	Assert(frstate->fdwxact != NULL);
+
+	/* Resolve all foreign transactions one by one */
+	while (frstate->fdwxact != NULL)
+	{
+		volatile FdwXact cur_fdwxact = frstate->fdwxact;
+		volatile FdwXact fdwxact_next = NULL;
+
+		/*
+		 * Remember the next FdwXact entry to resolve as the current entry will
+		 * be removed after resolved from the list.
+		 */
+		fdwxact_next = cur_fdwxact->fxact_next;
+
+		/* Resolve a foreign transaction */
+		if (!FdwXactResolveForeignTransaction(cur_fdwxact))
+		{
+			ForeignServer *fserver;
+
+			CHECK_FOR_INTERRUPTS();
+
+			/* Failed to resolve. Remember it for the next execution */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (fdwxacts_failed_to_resolve == NULL)
+			{
+				/*
+				 * For the first failed entry, reset its next pointer
+				 * and append it to the head of list.
+				 */
+				cur_fdwxact->fxact_next = NULL;
+				fdwxacts_failed_to_resolve = cur_fdwxact;
+			}
+			else
+			{
+				FdwXact fx = fdwxacts_failed_to_resolve;
+
+				/* Append the entry at the tail */
+				while (fx->fxact_next != NULL)
+					fx = fx->fxact_next;
+				fx->fxact_next = cur_fdwxact;
+			}
+			LWLockRelease(FdwXactLock);
+
+			fserver = GetForeignServer(cur_fdwxact->serverid);
+			ereport(LOG,
+					(errmsg("could not resolve a foreign transaction on server \"%s\"",
+							fserver->servername),
+					 errdetail("local transaction id is %u, connected by user id %u",
+							   cur_fdwxact->local_xid, cur_fdwxact->userid)));
+		}
+		else
+		{
+			/* Resolved. Update the cache entry if it's valid */
+			if (fdwxact_entry)
+				fdwxact_entry->participants = fdwxact_next;
+
+			elog(DEBUG2, "resolved a foreign transaction xid %u, serverid %d, userid %d",
+				 cur_fdwxact->local_xid, cur_fdwxact->serverid, cur_fdwxact->userid);
+		}
+
+		/* Advance the resolution status to the next */
+		frstate->fdwxact = fdwxact_next;
+	}
+
+	all_resolved = (fdwxacts_failed_to_resolve == NULL);
+
+	if (all_resolved)
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+		/* Remove the state cache entry from shmem hash table */
+		hash_search(FdwXactStateCache, (void *) &(frstate->waiter->fdwXactWaitXid),
+					HASH_REMOVE, NULL);
+
+		/*
+		 * Remove waiter from shmem queue, if not detached yet. The waiter
+		 * could already be detached if user cancelled to wait before
+		 * resolution.
+		 */
+		if (!SHMQueueIsDetached(&(frstate->waiter->fdwXactLinks)))
+		{
+			TransactionId	wait_xid = frstate->waiter->fdwXactWaitXid;
+
+			SHMQueueDelete(&(frstate->waiter->fdwXactLinks));
+
+			pg_write_barrier();
+
+			/* Set state to complete */
+			frstate->waiter->fdwXactState = FDW_XACT_WAIT_COMPLETE;
+
+			/* Wake up the waiter only when we have set state and removed from queue */
+			SetLatch(&(frstate->waiter->procLatch));
+
+			elog(DEBUG2, "released a proc xid %u", wait_xid);
+		}
+
+		LWLockRelease(FdwXactLock);
+
+		/* Reset resolution state */
+		frstate->waiter = NULL;
+		Assert(frstate->fdwxact == NULL);
+	}
+	else
+	{
+		/*
+		 * Update the fdwxact entry we're processing so that the failed
+		 * fdwxact entries will be processed again.
+		 */
+		frstate->fdwxact = fdwxacts_failed_to_resolve;
+	}
+
+	return all_resolved;
+}
+
+/*
+ * Resolve all dangling foreign transactions on the given database. Get
+ * all dangling foreign transactions from shmem global array and resolve
+ * them one by one.
+ *
+ * Unlike FdwXactResolveDistributedTransaction, for dangling transaction
+ * resolution, we don't bother the order of resolution because these entries
+ * already got out of order. So if failed to resolve a foreign transaction,
+ * we can go to the next foreign transaction that might associates with
+ * an another distributed transaction.
+ */
+void
+FdwXactResolveAllDanglingTransactions(Oid dbid)
+{
+	List		*dangling_fdwxacts = NIL;
+	ListCell	*cell;
+	bool		n_resolved = 0;
+	int			i;
+
+	Assert(OidIsValid(dbid));
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/*
+	 * Walk over the global array to make the list of dangling transactions
+	 * of which corresponding local transaction is on the given database.
+	 */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fxact = FdwXactCtl->fdw_xacts[i];
+
+		/*
+		 * Append the fdwxact entry on the given database to the list if
+		 * it's handled by nobody and the corresponding local transaction
+		 * is not part of the prepared transaction.
+		 */
+		if (fxact->dbid == dbid &&
+			fxact->registered_backend == InvalidBackendId &&
+			!TwoPhaseExists(fxact->local_xid))
+			dangling_fdwxacts = lappend(dangling_fdwxacts, fxact);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/* Return if there is no foreign transaction we need to resolve */
+	if (dangling_fdwxacts == NIL)
+		return;
+
+	foreach(cell, dangling_fdwxacts)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(cell);
+
+		if (!FdwXactResolveForeignTransaction(fdwxact))
+		{
+			ForeignServer *fserver = GetForeignServer(fdwxact->serverid);
+
+			/*
+			 * If failed to resolve this foreign transaction we skip it in
+			 * this resolution cycle. Try to resolve again in next cycle.
+			 */
+			ereport(LOG,
+					(errmsg("could not resolve a dangling foreign transaction on server \"%s\"",
+							fserver->servername),
+					 errdetail("local transaction id is %u, connected by user id %u",
+							   fdwxact->local_xid, fdwxact->userid)));
+			continue;
+		}
+
+		n_resolved++;
+	}
+
+	list_free(dangling_fdwxacts);
+
+	elog(DEBUG2, "resolved %d dangling foreign xacts", n_resolved);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ *
+ * In commit case, we have already prepared transactions on the foreign
+ * servers during pre-commit. And that prepared transactions will be
+ * resolved by the resolver process. So we don't do anything about the
+ * foreign transaction.
+ *
+ * In abort case, user requested rollback or we changed over rollback
+ * due to error during commit. To close current foreign transaction anyway
+ * we call rollback API to every foreign transaction. If we raised an error
+ * during preparing and came to here, it's possible that some entries of
+ * FdwXactParticipants already registered its FdwXact entry. If there is
+ * we leave them as dangling transaction and ask the resolver process to
+ * process them.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		int left_fdwxacts = 0;
+
+		foreach (lcell, FdwXactParticipantsForAC)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * Count FdwXact entries that we registered to shared memory array
+			 * in this transaction.
+			 */
+			if (fdw_part->fdw_xact)
+			{
+				/*
+				 * The status of foreign transaction must be either preparing
+				 * or prepared. In any case, since we have registered FdwXact
+				 * entry we leave them to the resolver process. For the preparing
+				 * state, since the foreign transaction might not close yet we
+				 * fall through and call rollback API. For the prepared state,
+				 * since the foreign transaction has closed we don't need to do
+				 * anything.
+				 */
+				Assert(fdw_part->fdw_xact->status == FDW_XACT_PREPARING ||
+					   fdw_part->fdw_xact->status == FDW_XACT_PREPARED);
+
+				left_fdwxacts++;
+				if (fdw_part->fdw_xact->status == FDW_XACT_PREPARED)
+					continue;
+			}
+
+			/*
+			 * Rollback all current foreign transaction. Since we're rollbacking
+			 * the transaction it's too late even if we raise an error here.
+			 * So we log it as warning.
+			 */
+			if (!fdw_part->rollback_foreign_xact(&fdw_part->foreign_xact))
+				ereport(WARNING,
+						(errmsg("could not abort transaction on server \"%s\"",
+								fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* If we left some FdwXact entries, ask the resolver process */
+		if (left_fdwxacts > 0)
+		{
+			ereport(WARNING,
+					(errmsg("left %u foreign transactions in in-doubt status",
+							left_fdwxacts)));
+			fdwxact_maybe_launch_resolver(true);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * AtPrepare_FdwXacts
+ *
+ * If there are foreign servers involved in the transaction, this function
+ * prepares transactions on those servers.
+ *
+ * Note that it can happen that the transaction aborts after we prepared part
+ * of participants. In this case since we can change to abort we cannot forget
+ * FdwXactParticipantsForAC here. These are processed by the resolver process
+ * during aborting, or at EOXact_FdwXacts.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * We cannot prepare distributed transaction if any foreign server of
+	 * participants in the transaction isn't capable of two-phase commit.
+	 */
+	if ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION),
+				 errmsg("can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * FdwXactResolveForeignTransaction
+ *
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * handler routine. The foreign transaction can be a dangling transaction
+ * that is not interested by nobody. If the fate of foreign transaction is
+ * not determined yet, it'sdetermined according to the status of corresponding
+ * local transaction.
+ *
+ * If the resolution is successful, remove the foreign transaction entry from
+ * the shared memory and also remove the corresponding on-disk file.
+ */
+static bool
+FdwXactResolveForeignTransaction(FdwXact fdwxact)
+{
+	bool		resolved;
+	bool		is_commit;
+	ForeignServer		*fserver;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	ForeignTransaction	foreign_xact;
+
+	Assert(fdwxact);
+
+	/*
+	 * Determine whether we commit or abort this foreign transaction.
+	 */
+	if (fdwxact->status == FDW_XACT_COMMITTING_PREPARED)
+		is_commit = true;
+	else if (fdwxact->status == FDW_XACT_ABORTING_PREPARED)
+		is_commit = false;
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	else if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_COMMITTING_PREPARED;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_ABORTING_PREPARED;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		is_commit = false;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction
+	 * state is neither committing or aborting. This should not
+	 * happen because we cannot determine to do commit or abort for
+	 * foreign transaction associated with the in-progress local
+	 * transaction.
+	 */
+	else
+		ereport(ERROR,
+				(errmsg("cannot resolve foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid)));
+
+	/* Construct foreign server connection information for passing to API */
+	fserver = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(fserver->fdwid);
+	user_mapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	foreign_xact.server = fserver;
+	foreign_xact.usermapping = user_mapping;
+	foreign_xact.fx_id = fdwxact->fdw_xact_id;
+
+	/* Resolve the foreign transaction */
+	Assert(fdw_routine->ResolveForeignTransaction);
+	resolved = fdw_routine->ResolveForeignTransaction(&foreign_xact,
+													  is_commit);
+
+	if (!resolved)
+	{
+		ForeignServer *fserver = GetForeignServer(fdwxact->serverid);
+		ereport(ERROR,
+				(errmsg("could not %s a prepared foreign transaction on server \"%s\"",
+						is_commit ? "commit" : "rollback", fserver->servername),
+				 errdetail("local transaction id is %u, connected by user id %u",
+						   fdwxact->local_xid, fdwxact->userid)));
+	}
+	else
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdw_xact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	return resolved;
+}
+
+/*
+ * Return one FdwXact entry that matches to given arguments, otherwise
+ * return NULL. Since this function search FdwXact entry by unique key
+ * all arguments should be valid.
+ */
+static FdwXact
+get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				 bool need_lock)
+{
+	List	*fdw_xact_list;
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, need_lock);
+
+	/* Could not find entry */
+	if (fdw_xact_list == NIL)
+		return NULL;
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdw_xact_list) == 1);
+
+	return (FdwXact) linitial(fdw_xact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdw_xact_exists(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdw_xact_list;
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, true);
+
+	return fdw_xact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_prepared_fdw_xacts.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdw_xacts(int *length)
+{
+	List		*all_fdw_xacts;
+	ListCell	*lc;
+	FdwXact		fdw_xacts;
+	int			num_fdw_xacts = 0;
+
+	Assert(length != NULL);
+
+	/* Get all entries */
+	all_fdw_xacts = get_fdw_xacts(InvalidOid, InvalidTransactionId,
+								  InvalidOid, InvalidOid, true);
+
+	if (all_fdw_xacts == NIL)
+	{
+		*length = 0;
+		return NULL;
+	}
+
+	fdw_xacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdw_xacts));
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdw_xacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdw_xacts + num_fdw_xacts, fx,
+			   sizeof(FdwXactData));
+		num_fdw_xacts++;
+	}
+
+	*length = num_fdw_xacts;
+	list_free(all_fdw_xacts);
+
+	return fdw_xacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return
+ * NIL.
+ */
+static List*
+get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			  bool need_lock)
+{
+	int i;
+	List	*fdw_xact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact	fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool	matches = true;
+
+		/* xid */
+		if (xid != InvalidTransactionId && xid != fdw_xact->local_xid)
+			matches = false;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdw_xact->dbid != dbid)
+			matches = false;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdw_xact->serverid)
+			matches = false;
+
+		/* userid */
+		if (OidIsValid(userid) && fdw_xact->userid != userid)
+			matches = false;
+
+		/* Append it if matched */
+		if (matches)
+			fdw_xact_list = lappend(fdw_xact_list, fdw_xact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdw_xact_list;
+}
+
+/*
+ * fdw_xact_redo
+ * Apply the redo log for a foreign transaction.
+ */
+void
+fdw_xact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDW_XACT_REMOVE)
+	{
+		xl_fdw_xact_remove *record = (xl_fdw_xact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. Returned string
+ * value is used to identify foreign transaction. The identifier should not
+ * be same as any other concurrent prepared transaction identifier.
+ *
+ * To make the foreign transactionid, we should ideally use something like
+ * UUID, which gives unique ids with high probability, but that may be expensive
+ * here and UUID extension which provides the function to generate UUID is
+ * not part of the core code.
+ */
+static char *
+generate_fdw_xact_identifier(Oid serverid, Oid userid)
+{
+	char*	fdw_xact_id;
+
+	fdw_xact_id = (char *)palloc(FDW_XACT_ID_MAX_LEN * sizeof(char));
+
+	snprintf(fdw_xact_id, FDW_XACT_ID_MAX_LEN, "%s_%ld_%d_%d",
+			 "fx", Abs(random()), serverid, userid);
+	fdw_xact_id[strlen(fdw_xact_id)] = '\0';
+
+	return fdw_xact_id;
+}
+
+/*
+ * CheckPointFdwXact
+ *
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * In order to avoid disk I/O while holding a light weight lock, the function
+ * first collects the files which need to be synced under FdwXactLock and then
+ * syncs them after releasing the lock. This approach creates a race condition:
+ * after releasing the lock, and before syncing a file, the corresponding
+ * foreign transaction entry and hence the file might get removed. The function
+ * checks whether that's true and ignores the error if so.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdw_xacts = 0;
+
+	/* Quick get-away, before taking lock */
+	if (max_prepared_foreign_xacts <= 0)
+		return;
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Another quick, before we allocate memory */
+	if (FdwXactCtl->numFdwXacts <= 0)
+	{
+		LWLockRelease(FdwXactLock);
+		return;
+	}
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		FdwXact		fxact = FdwXactCtl->fdw_xacts[cnt];
+
+		if ((fxact->valid || fxact->inredo) &&
+			!fxact->ondisk &&
+			fxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fxact->dbid, fxact->local_xid,
+								fxact->serverid, fxact->userid,
+								buf, len);
+			fxact->ondisk = true;
+			fxact->insert_start_lsn = InvalidXLogRecPtr;
+			fxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdw_xacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDW_XACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdw_xacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdw_xacts,
+							 serialized_fdw_xacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDW_XACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDW_XACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * ProcessFdwXactBuffer
+ *
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid = ShmemVariableCache->nextXid;
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(insert_start_lsn != InvalidXLogRecPtr);
+
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid, true);
+		if (buf == NULL)
+		{
+			ereport(WARNING,
+					(errmsg("removing corrupt fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+			return NULL;
+		}
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return thecontents in
+ * a structure allocated in-memory. Otherwise return NULL. The structure can
+ * be later freed by the caller.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				bool give_warnings)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+	{
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not stat FDW transaction state file \"%s\": %m",
+							path)));
+		return NULL;
+	}
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdw_xact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+	{
+		CloseTransientFile(fd);
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+		return NULL;
+	}
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+	{
+		CloseTransientFile(fd);
+		return NULL;
+	}
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_READ);
+	if (read(fd, buf, stat.st_size) != stat.st_size)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read FDW transaction state file \"%s\": %m",
+					  path)));
+		return NULL;
+	}
+
+	pgstat_report_wait_end();
+	CloseTransientFile(fd);
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+	{
+		pfree(buf);
+		return NULL;
+	}
+
+	/* Check if the contents is an expected data */
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fxact_file_data->dbid  != dbid ||
+		fxact_file_data->serverid != serverid ||
+		fxact_file_data->userid != userid ||
+		fxact_file_data->local_xid != xid)
+	{
+		ereport(WARNING,
+			(errmsg("invalid foreign transaction state file \"%s\"",
+					path)));
+		CloseTransientFile(fd);
+		pfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/*
+ * PrescanFdwXacts
+ *
+ * Scan the all foreign transactions directory for oldest active transaction.
+ * This is run during database startup, after we completed reading WAL.
+ * ShmemVariableCache->nextXid has been set to one more than the highest XID
+ * for which evidence exists in WAL.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	TransactionId nextXid = ShmemVariableCache->nextXid;
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+		 strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			TransactionId local_xid;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/*
+			 * Remove a foreign prepared transaction file corresponding to an
+			 * XID, which is too new.
+			 */
+			if (TransactionIdFollowsOrEquals(local_xid, nextXid))
+			{
+				ereport(WARNING,
+						(errmsg("removing future foreign prepared transaction file \"%s\"",
+								clde->d_name)));
+				RemoveFdwXactFile(dbid, local_xid, serverid, userid, true);
+				continue;
+			}
+
+			if (TransactionIdPrecedesOrEquals(local_xid, oldestActiveXid))
+				oldestActiveXid = local_xid;
+		}
+	}
+
+	FreeDir(cldir);
+	return oldestActiveXid;
+}
+
+/*
+ * restoreFdwXactData
+ *
+ * Scan pg_fdw_xact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * FdwXactRedoAdd
+ *
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fxact = insert_fdw_xact(fxact_data->dbid, fxact_data->local_xid,
+							fxact_data->serverid, fxact_data->userid,
+							fxact_data->fdw_xact_id);
+
+	/*
+	 * Set status as preparing, since we do not know the xact status
+	 * right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->insert_start_lsn = start_lsn;
+	fxact->insert_end_lsn = end_lsn;
+	fxact->inredo = true;	/* added in redo */
+	fxact->valid = false;
+	fxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * FdwXactRedoRemove
+ *
+ * Remove the corresponding fdw_xact entry from FdwXactCtl.
+ * Also remove fdw_xact file if a foreign transaction was saved
+ * via an earlier checkpoint.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdw_xact(dbid, xid, serverid, userid,
+							   false);
+
+	if (fdwxact == NULL)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdw_xact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(bool *newval, void **extra, GucSource source)
+{
+	/* Parameter check */
+	if (*newval &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_xacts\" or \"max_foreign_xact_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdw_xacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_prepared_fdw_xacts(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdw_xacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdw_xacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(6, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdw_xacts = get_all_fdw_xacts(&num_fdw_xacts);
+		status->num_xacts = num_fdw_xacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdw_xact = &status->fdw_xacts[status->cur_xact++];
+		Datum		values[6];
+		bool		nulls[6];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdw_xact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdw_xact->dbid);
+		values[1] = TransactionIdGetDatum(fdw_xact->local_xid);
+		values[2] = ObjectIdGetDatum(fdw_xact->serverid);
+		values[3] = ObjectIdGetDatum(fdw_xact->userid);
+		switch (fdw_xact->status)
+		{
+			case FDW_XACT_PREPARING:
+				xact_status = "prepared";
+				break;
+			case FDW_XACT_COMMITTING_PREPARED:
+				xact_status = "committing";
+				break;
+			case FDW_XACT_ABORTING_PREPARED:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		/* should this be really interpreted by FDW */
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdw_xact->fdw_xact_id,
+															 strlen(fdw_xact->fdw_xact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+	bool			ret;
+
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, true);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	ret = FdwXactResolveForeignTransaction(fdwxact);
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, false);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	remove_fdw_xact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/foreign/fdwxact_launcher.c b/src/backend/foreign/fdwxact_launcher.c
new file mode 100644
index 0000000..6782c33
--- /dev/null
+++ b/src/backend/foreign/fdwxact_launcher.c
@@ -0,0 +1,587 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * There is a shared memory area where the information of resolver process
+ * is stored. Requesting of starting new resolver process by backend process
+ * is done via that shared memory area. Note that the launcher is assuming
+ * that there is no more than one starting request for a database.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/foreign/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/resolver_internal.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid, int slot);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+Datum pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS);
+
+/*
+ * Wake up the launcher process.
+ */
+void
+FdwXactLauncherWakeup(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR1);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactQueue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+
+		now = GetCurrentTimestamp();
+
+		if (TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			bool launched;
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+				last_start_time = now;
+		}
+		else
+		{
+			/*
+			 * The wint in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver worker
+ * if not running yet. A foreign transaction resolver worker is responsible
+ * for resolution of foreign transaction that are registered on a database.
+ * So if a resolver worker already is launched, we don't need to launch new
+ * one.
+ */
+void
+fdwxact_maybe_launch_resolver(bool ignore_error)
+{
+	FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->pid != InvalidPid &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * If we found the resolver for my database, we don't need to launch new
+	 * one but wake running worker up.
+	 */
+	if (found)
+	{
+		SetLatch(resolver->latch);
+
+		elog(DEBUG1, "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		return;
+	}
+
+	/* Looking for unused worker slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	/*
+	 * However if there are no more free worker slots, inform user about it before
+	 * exiting.
+	 */
+	if (!found)
+	{
+		LWLockRelease(FdwXactResolverLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+		return;
+	}
+
+	Assert(resolver->pid == InvalidPid);
+
+	/* Found a new resolver process */
+	resolver->dbid = MyDatabaseId;
+	resolver->in_use = true;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Wake up launcher */
+	FdwXactLauncherWakeup();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' at 'slot' if given. If slot is negative value we find an unused slot.
+ * Note that caller must hold FdwXactResolverLock in exclusive mode.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid, int slot)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int launch_slot = slot;
+
+	/* If slot number is invalid, we find an unused slot */
+	if (launch_slot < 0)
+	{
+		int i;
+
+		for (i = 0; i < max_foreign_xact_resolvers; i++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+			if (resolver->in_use && resolver->dbid == dbid)
+				return;
+
+			if (!resolver->in_use)
+			{
+				launch_slot = i;
+				break;
+			}
+		}
+	}
+
+	/* No unused found */
+	if (launch_slot < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[launch_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_main_arg = Int32GetDatum(launch_slot);
+	bgw.bgw_notify_pid = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch all foreign transaction resolvers that are required by backend process
+ * but not running.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	int i, j;
+	int num_launches = 0;
+	int num_unused_slots = 0;
+	int num_dbs = 0;
+	bool launched = false;
+	Oid	*dbs_to_launch;
+	Oid *dbs_having_worker = palloc0(sizeof(Oid) * max_foreign_xact_resolvers);
+
+	/*
+	 * Launch resolver workers on the databases that are requested
+	 * by backend processes.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* Remember unused worker slots */
+		if (!resolver->in_use)
+			num_unused_slots++;
+
+		/* Remember databases that are having a resolve worker */
+		if (OidIsValid(resolver->dbid))
+			dbs_having_worker[num_dbs++] = resolver->dbid;
+
+		/* Launch new foreign transaction resolver worker on the database */
+		if (resolver->in_use &&
+			OidIsValid(resolver->dbid) &&
+			resolver->pid == InvalidPid)
+		{
+			fdwxact_launch_resolver(resolver->dbid, i);
+			launched = true;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* There is no unused slot, exit */
+	if (num_unused_slots == 0)
+		return launched;
+
+	dbs_to_launch = (Oid *) palloc(sizeof(Oid) * num_unused_slots);
+
+	/*
+	 * If there is unused slot, we can launch foreign transaction resolver
+	 * on databases that has unresolved foreign transaction but doesn't
+	 * have any resolver. This usually happens when resolvers crash for
+	 * whatever reason. Scanning all FdwXact entries could takes time but
+	 * since this is a relaunch case it's not harmless.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool found = false;
+
+		if (num_launches > num_unused_slots)
+			break;
+
+		for (j = 0; j < num_dbs; j++)
+		{
+			if (dbs_having_worker[j] == fdw_xact->dbid)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (found)
+			continue;
+
+		dbs_to_launch[num_launches++] = fdw_xact->dbid;
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* Launch resolver process for a database at any worker slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < num_launches; i++)
+	{
+		fdwxact_launch_resolver(dbs_to_launch[i], -1);
+		launched = true;
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+
+/*
+ * Returns activity of foreign transaction resolvers, including pids, the number
+ * of tasks and the last resolution time.
+ */
+Datum
+pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		if (resolver->pid == 0)
+		{
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/foreign/fdwxact_resolver.c b/src/backend/foreign/fdwxact_resolver.c
new file mode 100644
index 0000000..7f7ff8f
--- /dev/null
+++ b/src/backend/foreign/fdwxact_resolver.c
@@ -0,0 +1,310 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for every databases.
+ *
+ * A resolver process continues to resolve foreign transactions on a database
+ * It resolves two types of foreign transactions: on-line foreign transaction
+ * and dangling foreign transaction. The on-line foreign transaction is a
+ * foreign transaction that a concurrent backend process is waiting for
+ * resolution. The dangling transaction is a foreign transaction that corresponding
+ * distributed transaction ended up in in-doubt state. A resolver process
+ * doesn' exit as long as there is at least one unresolved foreign transaction
+ * on the database even if the timeout has come.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/foreign/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/resolver_internal.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+
+//static MemoryContext ResolveContext = NULL;
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FdwXactRslvLoop(void);
+static long FdwXactRslvComputeSleepTime(TimestampTz now);
+static void FdwXactRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+	FdwXactLauncherWakeup();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	TIMESTAMP_NOBEGIN(MyFdwXactResolver->last_resolved_time);
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactRslvLoop(void)
+{
+	FdwXactResolveState *fstate;
+
+	/* Create an FdwXactResolveState */
+	fstate = CreateFdwXactResolveState();
+
+	/* Enter main loop */
+	for (;;)
+	{
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time;
+		bool		resolved;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve a distributed transaction */
+		StartTransactionCommand();
+		resolved = FdwXactResolveDistributedTransaction(fstate);
+		CommitTransactionCommand();
+
+		now = GetCurrentTimestamp();
+
+		/* Update my state */
+		if (resolved)
+			MyFdwXactResolver->last_resolved_time = now;
+
+		/* Check for fdwxact resolver timeout */
+		FdwXactRslvCheckTimeout(now);
+
+		/*
+		 * If we have resolved any distributed transaction we go the next
+		 * without both resolving dangling transaction and sleeping because
+		 * there might be other on-line transactions waiting to be resolved.
+		 */
+		if (!resolved)
+		{
+			/* Resolve dangling transactions as mush as possible */
+			StartTransactionCommand();
+			FdwXactResolveAllDanglingTransactions(MyDatabaseId);
+			CommitTransactionCommand();
+
+			sleep_time = FdwXactRslvComputeSleepTime(now);
+
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   sleep_time,
+						   WAIT_EVENT_FDW_XACT_RESOLVER_MAIN);
+
+			if (rc & WL_POSTMASTER_DEATH)
+				proc_exit(1);
+		}
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/*
+	 * Reached to the timeout. We exit if there is no more both pending on-line
+	 * transactions and dangling transactions.
+	 */
+	if (!fdw_xact_exists(InvalidTransactionId, MyDatabaseId, InvalidOid,
+						 InvalidOid))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyFdwXactResolver->dbid))));
+		CommitTransactionCommand();
+
+		fdwxact_resolver_detach();
+		proc_exit(0);
+	}
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. Return the sleep time
+ * in milliseconds, -1 means that we reached to the timeout and should exits
+ */
+static long
+FdwXactRslvComputeSleepTime(TimestampTz now)
+{
+	static TimestampTz	wakeuptime = 0;
+	long	sleeptime;
+	long	sec_to_timeout;
+	int		microsec_to_timeout;
+
+	if (now >= wakeuptime)
+		wakeuptime = TimestampTzPlusMilliseconds(now,
+												 foreign_xact_resolution_retry_interval);
+
+	/* Compute relative time until wakeup. */
+	TimestampDifference(now, wakeuptime,
+						&sec_to_timeout, &microsec_to_timeout);
+
+	sleeptime = sec_to_timeout * 1000 + microsec_to_timeout / 1000;
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index eac78a5..1873a24 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -155,6 +155,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb4..cfd73f5 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -16,6 +16,8 @@
 
 #include "libpq/pqsignal.h"
 #include "access/parallel.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 084573e..cd25c5f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3492,6 +3492,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
 			event_name = "LogicalLauncherMain";
 			break;
@@ -3683,6 +3689,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3898,6 +3907,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDW_XACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b3..1c9ca53 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -99,6 +99,8 @@
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
 #include "common/ip.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "lib/ilist.h"
 #include "libpq/auth.h"
 #include "libpq/libpq.h"
@@ -905,6 +907,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires maX_foreign_xact_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -980,12 +986,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 59c003d..ce09a2a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDW_XACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..5f321fe 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "foreign/fdwxact_launcher.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -150,6 +151,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, BackendRandomShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +273,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	BackendRandomShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 9db184f..1d2176c 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -90,6 +90,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -245,6 +247,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1321,6 +1324,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
 	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	volatile TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1382,6 +1386,7 @@ GetOldestXmin(Relation rel, int flags)
 	/* fetch into volatile var while ProcArrayLock is held */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1432,6 +1437,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDW_XACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3001,6 +3015,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..a42d06e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,5 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+FdwXactLock					46
+FdwXactResolverLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f30e08..577d2ff 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "foreign/fdwxact.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -397,6 +398,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -797,6 +802,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f413395..2b3dee5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -43,6 +43,8 @@
 #include "commands/async.h"
 #include "commands/prepare.h"
 #include "executor/spi.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "jit/jit.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -2904,6 +2906,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index fa3c8a7..9b9eae8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -41,6 +41,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "commands/trigger.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "jit/jit.h"
 #include "libpq/auth.h"
@@ -651,6 +652,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -1823,6 +1828,16 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+			gettext_noop("Sets the usage of two-phase commit protocol for distributed transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		false,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -2227,6 +2242,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f43086f..919736d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -121,6 +121,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -288,6 +290,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index ad06e8e..ca3eb62 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ae22e7d..079e8b0 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdw_xact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 895a51f..5f0683d 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -306,6 +306,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_worker_processes);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_xacts setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 8cff535..2082ac0 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -730,6 +730,7 @@ GuessControlValues(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -957,6 +958,7 @@ RewriteControlFile(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* Contents are protected with a CRC */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..15bfeb4 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -26,6 +26,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact_xlog.h"
 #include "replication/message.h"
 #include "replication/origin.h"
 #include "rmgrdesc.h"
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 0bbe987..c15dff7 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDW_XACT_ID, "Foreign Transactions", fdw_xact_redo, fdw_xact_desc, fdw_xact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 0e932da..b199c88 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index c7b4144..7180bd1 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -105,6 +105,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE				(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 7c76683..70fa1f1 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -227,6 +227,7 @@ typedef struct xl_parameter_change
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6..3d5333a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -178,6 +178,7 @@ typedef struct ControlFileData
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66c6c22..1c56e16 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5199,6 +5199,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '4163', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_fdwxact_resolver', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,oid,timestamptz}',
+  proargmodes => '{o,o,o,o}',
+  proargnames => '{pid,dbid,n_entries,last_resolved_time}',
+  prosrc => 'pg_stat_get_fdwxact_resolver' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5910,6 +5917,22 @@
   proargnames => '{type,name,args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '4160', descr => 'view foreign transactions',
+  proname => 'pg_prepared_fdw_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{dbid,transaction,serverid,userid,status,identifier}',
+  prosrc => 'pg_prepared_fdw_xacts' },
+{ oid => '4161', descr => 'remove foreign transaction',
+  proname => 'pg_remove_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_remove_fdw_xact' },
+{ oid => '4162', descr => 'resolve foreign transaction',
+  proname => 'pg_resolve_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_resolve_fdw_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb54..f76e83d 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "foreign/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
 
@@ -168,6 +169,12 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef bool (*PrepareForeignTransaction_function) (ForeignTransaction *foreign_xact);
+typedef bool (*CommitForeignTransaction_function) (ForeignTransaction *foreign_xact);
+typedef bool (*RollbackForeignTransaction_function) (ForeignTransaction *foreing_xact);
+typedef bool (*ResolveForeignTransaction_function) (ForeignTransaction *foreign_xact,
+													bool is_commit);
+typedef bool (*IsTwoPhaseCommitEnabled_function) (Oid serverid);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -235,6 +242,13 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for distributed transactions */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	ResolveForeignTransaction_function ResolveForeignTransaction;
+	IsTwoPhaseCommitEnabled_function IsTwoPhaseCommitEnabled;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -247,7 +261,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
@@ -258,4 +271,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in foreign/fdwxact.c */
+extern void FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id);
+
 #endif							/* FDWAPI_H */
diff --git a/src/include/foreign/fdwxact.h b/src/include/foreign/fdwxact.h
new file mode 100644
index 0000000..5138a2c
--- /dev/null
+++ b/src/include/foreign/fdwxact.h
@@ -0,0 +1,147 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL distributed transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact.h
+ */
+#ifndef FDW_XACT_H
+#define FDW_XACT_H
+
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "foreign/fdwxact_xlog.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+#define	FDW_XACT_NOT_WAITING		0
+#define	FDW_XACT_WAITING			1
+#define	FDW_XACT_WAIT_COMPLETE		2
+
+#define FdwXactEnabled() (max_prepared_foreign_xacts > 0)
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDW_XACT_ID_MAX_LEN 200
+
+/* Enum to track the status of prepared foreign transaction */
+typedef enum
+{
+	FDW_XACT_INITIAL,
+	FDW_XACT_PREPARING,					/* foreign transaction is being prepared */
+	FDW_XACT_PREPARED,					/* foriegn transaction is prepared */
+	FDW_XACT_COMMITTING_PREPARED,		/* foreign prepared transaction is to
+										 * be committed */
+	FDW_XACT_ABORTING_PREPARED, /* foreign prepared transaction is to be
+								 * aborted */
+} FdwXactStatus;
+
+/* Shared memory entry for a prepared or being prepared foreign transaction */
+typedef struct FdwXactData *FdwXact;
+
+typedef struct FdwXactData
+{
+	FdwXact		fxact_free_next;	/* Next free FdwXact entry */
+	FdwXact		fxact_next;		/* Pointer to the neext FdwXact entry accosiated
+								 * with the same transaction */
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	FdwXactStatus status;		/* The state of the foreign
+								 * transaction. This doubles as the
+								 * action to be taken on this entry. */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid; /* Has the entry been complete and written to file? */
+	BackendId	registered_backend;	/* Backend who registered this entry */
+	bool		ondisk;			/* TRUE if prepare state file is on disk */
+	bool		inredo;			/* TRUE if entry was added via xlog_redo */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/* Shared memory layout for maintaining foreign prepared transaction entries. */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		freeFdwXacts;
+
+	/* Number of valid foreign transaction entries */
+	int			numFdwXacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdw_xacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* Struct for foreign transaction resolution */
+typedef struct FdwXactResolveState
+{
+	Oid				dbid;		/* database oid */
+	TransactionId	wait_xid;	/* local transaction id waiting to be resolved */
+	PGPROC			*waiter;	/* backend process waiter */
+	FdwXact			fdwxact;	/* foreign transaction entries to resolve */
+} FdwXactResolveState;
+
+/* Struct for foreign transaction passed to API */
+typedef struct ForeignTransaction
+{
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	char			*fx_id;
+} ForeignTransaction;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern bool foreign_twophase_commit;
+
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdw_xact_exists(TransactionId xid, Oid dboid, Oid serverid,
+				Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactResolveDistributedTransaction(FdwXactResolveState *fstate);
+extern void FdwXactResolveAllDanglingTransactions(Oid dbid);
+extern bool ForeignTwophaseCommitRequired(void);
+extern FdwXactResolveState *CreateFdwXactResolveState(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo,
+												  int flags);
+extern bool check_foreign_twophase_commit(bool *newval, void **extra,
+										  GucSource source);
+
+#endif   /* FDW_XACT_H */
diff --git a/src/include/foreign/fdwxact_launcher.h b/src/include/foreign/fdwxact_launcher.h
new file mode 100644
index 0000000..6ed003b
--- /dev/null
+++ b/src/include/foreign/fdwxact_launcher.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _FDWXACT_LAUNCHER_H
+#define _FDWXACT_LAUNCHER_H
+
+#include "foreign/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherWakeup(void);
+
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+
+extern bool IsFdwXactLauncher(void);
+
+extern void fdwxact_maybe_launch_resolver(bool ignore_error);
+
+
+#endif	/* _FDWXACT_LAUNCHER_H */
diff --git a/src/include/foreign/fdwxact_resolver.h b/src/include/foreign/fdwxact_resolver.h
new file mode 100644
index 0000000..5afd98c
--- /dev/null
+++ b/src/include/foreign/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "foreign/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/foreign/fdwxact_xlog.h b/src/include/foreign/fdwxact_xlog.h
new file mode 100644
index 0000000..f42725e
--- /dev/null
+++ b/src/include/foreign/fdwxact_xlog.h
@@ -0,0 +1,51 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDW_XACT_INSERT	0x00
+#define XLOG_FDW_XACT_REMOVE	0x10
+
+/* Same as GIDSIZE */
+#define FDW_XACT_MAX_ID_LEN 200
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdw_xact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+} xl_fdw_xact_remove;
+
+extern void fdw_xact_redo(XLogReaderState *record);
+extern void fdw_xact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdw_xact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e6..d030368 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -68,10 +68,10 @@ typedef struct ForeignTable
 	List	   *options;		/* ftoptions as DefElem list */
 } ForeignTable;
 
-
 extern ForeignServer *GetForeignServer(Oid serverid);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
 							bool missing_ok);
diff --git a/src/include/foreign/resolver_internal.h b/src/include/foreign/resolver_internal.h
new file mode 100644
index 0000000..9f8676b
--- /dev/null
+++ b/src/include/foreign/resolver_internal.h
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _RESOLVER_INTERNAL_H
+#define _RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/*
+	 * Foreign transaction resolution queue. Protected by FdwXactLock.
+	 */
+	SHM_QUEUE	FdwXactQueue;
+
+	/* Supervisor process */
+	pid_t		launcher_pid;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* _RESOLVER_INTERNAL_H */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f592..8a303af 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,6 +759,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDW_XACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -832,7 +834,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDW_XACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -912,6 +915,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_READ,
+	WAIT_EVENT_FDW_XACT_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 5c19a61..93953dc 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -150,6 +150,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction
+								 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 75bab29..25d6a2f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDW_XACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..81560bd 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -94,6 +94,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ae0cd25..7855225 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1413,6 +1413,13 @@ pg_policies| SELECT n.nspname AS schemaname,
    FROM ((pg_policy pol
      JOIN pg_class c ON ((c.oid = pol.polrelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+pg_prepared_fdw_xacts| SELECT f.dbid,
+    f.transaction,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.identifier
+   FROM pg_prepared_fdw_xacts() f(dbid, transaction, serverid, userid, status, identifier);
 pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
@@ -1821,6 +1828,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_fdwxact_resolvers| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_fdwxact_resolver() r(pid, dbid, n_entries, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
-- 
1.7.1

0003-postgres_fdw-supports-atomic-commit-APIs_v16.patchapplication/octet-stream; name=0003-postgres_fdw-supports-atomic-commit-APIs_v16.patchDownload
From 6f37e267cd081bbf70e877b825a7c18b4510eee2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/connection.c              |  534 ++++++++++++++++++------
 contrib/postgres_fdw/expected/postgres_fdw.out |  387 +++++++++++++++++-
 contrib/postgres_fdw/option.c                  |    5 +-
 contrib/postgres_fdw/postgres_fdw.c            |   60 +++-
 contrib/postgres_fdw/postgres_fdw.h            |   10 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  151 +++++++-
 doc/src/sgml/postgres-fdw.sgml                 |   37 ++
 7 files changed, 1040 insertions(+), 144 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fe4893a..9c0fa9a 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -15,8 +15,11 @@
 #include "postgres_fdw.h"
 
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -56,6 +59,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		am_participant_of_ac;	/* true if fdwxact code control the transaction */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -78,7 +82,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_xact_callback(XactEvent event, void *arg);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
@@ -91,20 +95,14 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
-
-
-/*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
- */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static bool pgfdw_commit_transaction(ConnCacheEntry *entry);
+static bool pgfdw_rollback_transaction(ConnCacheEntry *entry);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
 {
 	bool		found;
 	ConnCacheEntry *entry;
@@ -136,11 +134,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
 	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
+	key = umid;
 
 	/*
 	 * Find or create cached entry for requested connection.
@@ -182,6 +177,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping		*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +186,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->am_participant_of_ac = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +197,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,16 +213,46 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
 /*
  * Connect to remote server using specified server and user mapping properties.
+ * If the attempt to connect fails, and the caller can handle connection failure
+ * (connection_error_ok = true) return NULL, throw error otherwise.
  */
 static PGconn *
 connect_pg_server(ForeignServer *server, UserMapping *user)
@@ -265,11 +301,22 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 
 		conn = PQconnectdbParams(keywords, values, false);
 		if (!conn || PQstatus(conn) != CONNECTION_OK)
+		{
+			char	   *connmessage;
+			int			msglen;
+
+			/* libpq typically appends a newline, strip that */
+			connmessage = pstrdup(PQerrorMessage(conn));
+			msglen = strlen(connmessage);
+			if (msglen > 0 && connmessage[msglen - 1] == '\n')
+				connmessage[msglen - 1] = '\0';
+
 			ereport(ERROR,
 					(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
 					 errmsg("could not connect to server \"%s\"",
 							server->servername),
 					 errdetail_internal("%s", pchomp(PQerrorMessage(conn)))));
+		}
 
 		/*
 		 * Check that non-superuser has used password to establish connection;
@@ -414,15 +461,24 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
+	ForeignServer	*server = GetForeignServer(serverid);
 
 	/* Start main transaction if we haven't yet */
 	if (entry->xact_depth <= 0)
 	{
 		const char *sql;
 
+		/* Register the new foreign server if enabled */
+		if (server_uses_twophase_commit(server))
+		{
+			/* Register foreign server with auto-generated identifer */
+			FdwXactRegisterForeignTransaction(serverid, userid, NULL);
+			entry->am_participant_of_ac = true;
+		}
+
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
@@ -650,12 +706,11 @@ static void
 pgfdw_xact_callback(XactEvent event, void *arg)
 {
 	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
+	ConnCacheEntry	*entry;
 
-	/* Quick exit if no connections were touched in this transaction. */
+	/* Quick exit if no connections were touched in this transaction */
 	if (!xact_got_connection)
 		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote transactions, and
 	 * close them.
@@ -663,17 +718,20 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 	hash_seq_init(&scan, ConnectionHash);
 	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
 	{
-		PGresult   *res;
-
 		/* Ignore cache entry if no open connection right now */
 		if (entry->conn == NULL)
 			continue;
 
+		/*
+		 * Foreign transactions participating to atomic commit are ended
+		 * by two-phase commit APIs. Ignore them.
+		 */
+		if (entry->am_participant_of_ac)
+			continue;
+
 		/* If it has an open remote transaction, try to close it */
 		if (entry->xact_depth > 0)
 		{
-			bool		abort_cleanup_failure = false;
-
 			elog(DEBUG3, "closing remote transaction on connection %p",
 				 entry->conn);
 
@@ -681,40 +739,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 			{
 				case XACT_EVENT_PARALLEL_PRE_COMMIT:
 				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
+					pgfdw_commit_transaction(entry);
 					break;
 				case XACT_EVENT_PRE_PREPARE:
 
@@ -739,66 +764,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 					break;
 				case XACT_EVENT_PARALLEL_ABORT:
 				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
+					pgfdw_rollback_transaction(entry);
 					break;
 			}
 		}
@@ -1193,3 +1159,325 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * The function prepares transaction on foreign server. This function
+ * is called only at the pre-commit phase of the local transaction. Since
+ * we should have the connection to the server that we are interested in
+ * we don't use serverid and userid that are necessary to get user mapping
+ * that is the key of the connection cache.
+ */
+bool
+postgresPrepareForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+	StringInfo	command;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	if (!entry->conn)
+		return false;
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", foreign_xact->fx_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	if (result)
+		elog(DEBUG1, "prepared foreign transaction on server %u with ID %s",
+			 foreign_xact->server->serverid, foreign_xact->fx_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function commits the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresCommitForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	result = pgfdw_commit_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function rollbacks the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresRollbackForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool ret;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	/* Rollback a remote transaction */
+	ret = pgfdw_rollback_transaction(entry);
+
+	return ret;
+}
+
+bool
+postgresResolveForeignTransaction(ForeignTransaction *foreign_xact, bool is_commit)
+{
+	ConnCacheEntry *entry = NULL;
+	StringInfo	command;
+	bool result;
+	PGresult	*res;
+
+	entry = GetConnectionState(foreign_xact->usermapping->umid,
+							   false, false);
+
+	if (!entry->conn)
+		return false;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 foreign_xact->fx_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		/*
+		 * The command failed, raise a warning to log the reason of failure.
+		 * We may not be in a transaction here, so raising error doesn't
+		 * help. Even if we are in a transaction, it would be the resolver
+		 * transaction, which will get aborted on raising error, thus
+		 * delaying resolution of other prepared foreign transactions.
+		 */
+		pgfdw_report_error(WARNING, res, entry->conn, false, command->data);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * If we tried to COMMIT/ABORT a prepared transaction and the prepared
+		 * transaction was missing on the foreign server, it was probably
+		 * resolved by some other means. Anyway, it should be considered as resolved.
+		 */
+		result = (sqlstate == ERRCODE_UNDEFINED_OBJECT);
+	}
+	else
+		result = true;
+
+	elog(DEBUG1, "%s prepared foreign transaction on server %u with ID %s",
+		 is_commit ? "commit" : "rollback", foreign_xact->server->serverid,
+		 foreign_xact->fx_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->am_participant_of_ac = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+	xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
+
+static bool
+pgfdw_rollback_transaction(ConnCacheEntry *entry)
+{
+	bool abort_cleanup_failure = false;
+
+	/*
+	 * In rollback local transaction, if we don't the connection
+	 * it means any transaction started. So we can ragard it as
+	 * success.
+	 */
+	if (!entry || !entry->conn)
+		return true;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is already unsalvageable, don't touch it
+	 * further.
+	 */
+	if (entry->changing_xact_state)
+		return true;
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+	else
+	{
+		entry->have_prep_stmt = false;
+		entry->have_error = false;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return !abort_cleanup_failure;
+}
+
+static bool
+pgfdw_commit_transaction(ConnCacheEntry *entry)
+{
+	PGresult	*res;
+	bool result = false;
+
+	if (!entry || !entry->conn)
+		return false;
+
+	/*
+	 * If abort cleanup previously failed for this connection,
+	 * we can't issue any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	/*
+	 * If there were any errors in subtransactions, and we
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index cf4863c..b6a91a9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,15 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -185,16 +207,19 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                                      List of foreign tables
- Schema |   Table    |  Server   |                   FDW options                    | Description 
---------+------------+-----------+--------------------------------------------------+-------------
- public | ft1        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft2        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft4        | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
- public | ft5        | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft6        | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft_pg_type | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
-(6 rows)
+                                         List of foreign tables
+ Schema |      Table       |  Server   |                   FDW options                    | Description 
+--------+------------------+-----------+--------------------------------------------------+-------------
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
+(9 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8485,3 +8510,345 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+\det+
+                                                 List of foreign tables
+ Schema |      Table       |  Server   |                            FDW options                            | Description 
+--------+------------------+-----------+-------------------------------------------------------------------+-------------
+ public | fpagg_tab_p1     | loopback  | (table_name 'pagg_tab_p1')                                        | 
+ public | fpagg_tab_p2     | loopback  | (table_name 'pagg_tab_p2')                                        | 
+ public | fpagg_tab_p3     | loopback  | (table_name 'pagg_tab_p3')                                        | 
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')                             | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1', use_remote_estimate 'true') | 
+ public | ft3              | loopback  | (table_name 'loct3', use_remote_estimate 'true')                  | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')                             | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type')                  | 
+ public | ftprt1_p1        | loopback  | (table_name 'fprt1_p1', use_remote_estimate 'true')               | 
+ public | ftprt1_p2        | loopback  | (table_name 'fprt1_p2')                                           | 
+ public | ftprt2_p1        | loopback  | (table_name 'fprt2_p1', use_remote_estimate 'true')               | 
+ public | ftprt2_p2        | loopback  | (table_name 'fprt2_p2', use_remote_estimate 'true')               | 
+ public | rem1             | loopback  | (table_name 'loc1')                                               | 
+ public | rem2             | loopback  | (table_name 'loc2')                                               | 
+(19 rows)
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+  srvname  
+-----------
+ loopback
+ loopback2
+ loopback3
+(3 rows)
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+ERROR:  cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  cannot prepare a transaction that modified remote tables
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 6854f1b..1f45b1c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -108,7 +108,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		 * Validate option value, when we can do so without any context.
 		 */
 		if (strcmp(def->defname, "use_remote_estimate") == 0 ||
-			strcmp(def->defname, "updatable") == 0)
+			strcmp(def->defname, "updatable") == 0 ||
+			strcmp(def->defname, "two_phase_commit") == 0)
 		{
 			/* these accept only boolean values */
 			(void) defGetBoolean(def);
@@ -177,6 +178,8 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* two phase commit support */
+		{"two_phase_commit", ForeignServerRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 78b0f43..28bd246 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/xact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -21,6 +22,7 @@
 #include "commands/explain.h"
 #include "commands/vacuum.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -358,6 +360,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
 							 void *extra);
+static bool postgresIsTwoPhaseCommitEnabled(Oid serverid);
 
 /*
  * Helper functions
@@ -451,7 +454,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -505,10 +507,29 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->ResolveForeignTransaction = postgresResolveForeignTransaction;
+	routine->IsTwoPhaseCommitEnabled = postgresIsTwoPhaseCommitEnabled;
+
 	PG_RETURN_POINTER(routine);
 }
 
 /*
+ * postgresIsTwoPhaseCommitEnabled
+ */
+static bool
+postgresIsTwoPhaseCommitEnabled(Oid serverid)
+{
+	ForeignServer	*server = GetForeignServer(serverid);
+
+
+	return server_uses_twophase_commit(server);
+}
+
+/*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
  *
@@ -1355,7 +1376,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2400,7 +2421,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2697,7 +2718,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								&retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3314,7 +3335,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4101,7 +4122,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4191,7 +4212,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4414,7 +4435,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
@@ -5795,3 +5816,26 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+/*
+ * server_uses_twophase_commit
+ * Returns true if the foreign server is configured to support 2PC.
+ */
+bool
+server_uses_twophase_commit(ForeignServer *server)
+{
+	ListCell		*lc;
+
+	/* Check the options for two phase compliance */
+	foreach(lc, server->options)
+	{
+		DefElem    *d = (DefElem *) lfirst(lc);
+
+		if (strcmp(d->defname, "two_phase_commit") == 0)
+		{
+			return defGetBoolean(d);
+		}
+	}
+	/* By default a server is not 2PC compliant */
+	return false;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index a5d4011..585cf3e 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "foreign/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
@@ -115,7 +116,8 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
+extern PGconn *GetExistingConnection(Oid umid);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -123,6 +125,11 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern bool postgresPrepareForeignTransaction(ForeignTransaction *foreign_xact);
+extern bool postgresCommitForeignTransaction(ForeignTransaction *foreign_xact);
+extern bool postgresRollbackForeignTransaction(ForeignTransaction *foriegn_xact);
+extern bool postgresResolveForeignTransaction(ForeignTransaction *foreign_xact,
+											  bool is_commit);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -179,6 +186,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						List *remote_conds, List *pathkeys, bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index cdfd9c9..3dd82a1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,19 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -2251,7 +2278,6 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 
 RESET enable_partitionwise_join;
 
-
 -- ===================================================================
 -- test partitionwise aggregates
 -- ===================================================================
@@ -2301,3 +2327,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+
+\det+
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 54b5e98..f4a9ff5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
-- 
1.7.1

0004-Add-regression-tests-for-atomic-commit_v16.patchapplication/octet-stream; name=0004-Add-regression-tests-for-atomic-commit_v16.patchDownload
From ca5efcd3c0e32640d44a867ca06ea1e5e9c8edf8 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |    2 +-
 src/test/recovery/t/015_fdwxact.pl |  175 ++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |   13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/015_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0..71c8b9d 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/015_fdwxact.pl b/src/test/recovery/t/015_fdwxact.pl
new file mode 100644
index 0000000..a23f120
--- /dev/null
+++ b/src/test/recovery/t/015_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port', two_phase_commit 'on');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port', two_phase_commit 'on');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 2ff2acc..bfc8f53 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2286,9 +2286,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2303,7 +2306,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
1.7.1

#2Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#1)
4 attachment(s)

On Mon, Jun 11, 2018 at 1:53 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jun 5, 2018 at 7:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, May 26, 2018 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, May 18, 2018 at 11:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Regarding to API design, should we use 2PC for a distributed
transaction if both two or more 2PC-capable foreign servers and
2PC-non-capable foreign server are involved with it? Or should we end
up with an error? the 2PC-non-capable server might be either that has
2PC functionality but just disables it or that doesn't have it.

It seems to me that this is functionality that many people will not
want to use. First, doing a PREPARE and then a COMMIT for each FDW
write transaction is bound to be more expensive than just doing a
COMMIT. Second, because the default value of
max_prepared_transactions is 0, this can only work at all if special
configuration has been done on the remote side. Because of the second
point in particular, it seems to me that the default for this new
feature must be "off". It would make to ship a default configuration
of PostgreSQL that doesn't work with the default configuration of
postgres_fdw, and I do not think we want to change the default value
of max_prepared_transactions. It was changed from 5 to 0 a number of
years back for good reason.

I'm not sure that many people will not want to use this feature
because it seems to me that there are many people who don't want to
use the database that is missing transaction atomicity. But I agree
that this feature should not be enabled by default as we disable 2PC
by default.

So, I think the question could be broadened a bit: how you enable this
feature if you want it, and what happens if you want it but it's not
available for your choice of FDW? One possible enabling method is a
GUC (e.g. foreign_twophase_commit). It could be true/false, with true
meaning use PREPARE for all FDW writes and fail if that's not
supported, or it could be three-valued, like require/prefer/disable,
with require throwing an error if PREPARE support is not available and
prefer using PREPARE where available but without failing when it isn't
available. Another possibility could be to make it an FDW option,
possibly capable of being set at multiple levels (e.g. server or
foreign table). If any FDW involved in the transaction demands
distributed 2PC semantics then the whole transaction must have those
semantics or it fails. I was previous leaning toward the latter
approach, but I guess now the former approach is sounding better. I'm
not totally certain I know what's best here.

I agree that the former is better. That way, we also can control that
parameter at transaction level. If we allow the 'prefer' behavior we
need to manage not only 2PC-capable foreign server but also
2PC-non-capable foreign server. It requires all FDW to call the
registration function. So I think two-values parameter would be
better.

BTW, sorry for late submitting the updated patch. I'll post the
updated patch in this week but I'd like to share the new APIs design
beforehand.

Attached updated patches.

I've changed the new APIs to 5 functions and 1 registration function
because the rollback API can be called by both backend process and
resolver process which is not good design. The latest version patches
incorporated all comments I got except for documentation about overall
point to user. I'm considering what contents I should document it
there. I'll write it during the code patch is getting reviewed. The
basic design of new patches is almost same as the previous mail I
sent.

I introduced 5 new FDW APIs: PrepareForeignTransaction,
CommitForeignTransaction, RollbackForeignTransaction,
ResolveForeignTransaction and IsTwophaseCommitEnabled.
ResolveForeignTransaction is normally called by resolver process
whereas other four functions are called by backend process. Also I
introduced a registration function FdwXactRegisterForeignTransaction.
FDW that wish to support atomic commit requires to call this function
when a transaction opens on the foreign server. Registered foreign
transactions are controlled by the foreign transaction manager of
Postgres core and calls APIs at appropriate timing. It means that the
foreign transaction manager controls only foreign servers that are
capable of 2PC. For 2PC-non-capable foreign server, FDW must use
XactCallback to control the foreign transaction. 2PC is used at commit
when the distributed transaction modified data on two or more servers
including local server and user requested by foreign_twophase_commit
GUC parameter. All foreign transactions are prepared during pre-commit
and then commit locally. After committed locally wait for resolver
process to resolve all prepared foreign transactions. The waiting
backend is released (that is, returns the prompt to client) either
when all foreign transactions are resolved or when user requested to
waiting. If 2PC is not required, a foreign transaction is committed
during pre-commit phase of local transaction. IsTwophaseCommitEnabled
is called whenever the transaction begins to modify data on foreign
server. This is required to track whether the transaction modified
data on the foreign server that doesn't support or enable 2PC.

Atomic commit among multiple foreign servers is crash-safe. If the
coordinator server crashes during atomic commit, the foreign
transaction participants and their status are recovered during WAL
apply. Recovered foreign transactions are in doubt-state, aka dangling
transactions. If database has such transactions resolver process
periodically tries to resolve them.

I'll register this patch to next CF. Feedback is very welcome.

I attached the updated version patch as the previous versions conflict
with the current HEAD.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v17-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v17-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 7e9ced2ad44e5bc1dd651f18083b948daadbe7b8 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 8 Feb 2018 11:26:46 +0900
Subject: [PATCH v17 1/4] Keep track of writing on non-temporary relation.

---
 src/backend/access/heap/heapam.c | 12 ++++++++++++
 src/include/access/xact.h        |  5 +++++
 2 files changed, 17 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 72395a5..959a331 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2611,6 +2611,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleGetOid(tup);
 }
 
@@ -3440,6 +3444,10 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4390,6 +4398,10 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 083e879..c7b4144 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -98,6 +98,11 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
-- 
2.10.5

v17-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v17-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From f879e1d0b31728b9a15683ba06ceb9badfa9d56f Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH v17 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |   97 +
 doc/src/sgml/config.sgml                      |  124 ++
 doc/src/sgml/fdwhandler.sgml                  |  200 ++
 doc/src/sgml/func.sgml                        |   51 +
 doc/src/sgml/monitoring.sgml                  |   56 +
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   65 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   32 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/executor/execPartition.c          |    4 +
 src/backend/executor/nodeForeignscan.c        |    8 +
 src/backend/executor/nodeModifyTable.c        |    5 +
 src/backend/foreign/Makefile                  |    2 +-
 src/backend/foreign/fdwxact.c                 | 2762 +++++++++++++++++++++++++
 src/backend/foreign/fdwxact_launcher.c        |  587 ++++++
 src/backend/foreign/fdwxact_resolver.c        |  310 +++
 src/backend/foreign/foreign.c                 |   43 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    5 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   61 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/foreign/fdwapi.h                  |   18 +-
 src/include/foreign/fdwxact.h                 |  147 ++
 src/include/foreign/fdwxact_launcher.h        |   31 +
 src/include/foreign/fdwxact_resolver.h        |   23 +
 src/include/foreign/fdwxact_xlog.h            |   51 +
 src/include/foreign/foreign.h                 |    2 +-
 src/include/foreign/resolver_internal.h       |   65 +
 src/include/pgstat.h                          |    8 +-
 src/include/storage/proc.h                    |   10 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |   12 +
 57 files changed, 5052 insertions(+), 27 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100755 src/backend/foreign/fdwxact.c
 create mode 100644 src/backend/foreign/fdwxact_launcher.c
 create mode 100644 src/backend/foreign/fdwxact_resolver.c
 create mode 100644 src/include/foreign/fdwxact.h
 create mode 100644 src/include/foreign/fdwxact_launcher.h
 create mode 100644 src/include/foreign/fdwxact_resolver.h
 create mode 100644 src/include/foreign/fdwxact_xlog.h
 create mode 100644 src/include/foreign/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index fffb79f..e0d8157 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9629,6 +9629,103 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-prepared-fdw-xacts">
+  <title><structname>pg_prepared_fdw_xacts</structname></title>
+
+  <indexterm zone="view-pg-prepared-fdw-xacts">
+   <primary>pg_prepared_fdw_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_prepared_fdw_xacts</structname> displays
+   information about foreign transactions that are currently prepared on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="fdw-transaction-managements"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_prepared_xacts</structname> contains one row per prepared
+   foreign transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_prepared_fdw_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>transaction</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Transaction id that this foreign transaction associates with
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server that this foreign server is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction: <literal>prepared</literal>, <literal>committing</literal>, <literal>aborting</literal> or <literal>unknown</literal>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bee4afb..044cc7b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1546,6 +1546,29 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+      <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the maximum number of foreign transactions that can be prepared
+        simultaneously. A single local transaction can give rise to multiple
+        foreign transaction. If <literal>N</literal> local transactions each
+        across <literal>K</literal> foreign server this value need to be set
+        <literal>N * K</literal>, not just <literal>N</literal>.
+        This parameter can only be set at server start.
+       </para>
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-work-mem" xreflabel="work_mem">
       <term><varname>work_mem</varname> (<type>integer</type>)
       <indexterm>
@@ -3607,6 +3630,78 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
      </variablelist>
     </sect2>
 
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+
+     <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+      <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+      <indexterm>
+       <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+        resolver is responsible for foreign transaction resolution on one database.
+       </para>
+       <para>
+        Foreign transaction resolution workers are taken from the pool defined by
+        <varname>max_worker_processes</varname>.
+       </para>
+       <para>
+        The default value is 0.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+      <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specify how long the foreign transaction resolver should wait when the last resolution
+        fails before retrying to resolve foreign transaction. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 10 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+      <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Terminate foreign transaction resolver processes that don't have any foreign
+        transactions to resolve longer than the specified number of milliseconds.
+        A value of zero disables the timeout mechanism.  You should set this value to
+        zero only if you set <varname>max_foreign_transaction_resolvers</varname> as
+        much as databases you have. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 60 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     </variablelist>
+    </sect2>
+
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -7822,6 +7917,35 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-foreign-transaction">
+    <title>Foreign Transaction Management</title>
+
+    <variablelist>
+
+     <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophase_commit">
+      <term><varname>foreign_twophase_commit</varname> (<type>bool</type>)
+       <indexterm>
+        <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+       </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether transaction commit will wait for all involving foreign transaction
+        to be resolved before the command returns a "success" indication to the client.
+        Both <varname>max_prepared_foreign_transactions</varname> and
+        <varname>max_foreign_transaction_resolvers</varname> must be non-zero value to
+        allow foreign twophase commit to be used.
+       </para>
+       <para>
+        This parameter can be changed at any time; the behavior for any one transaction
+        is determined by the setting in effect when it commits.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd..24c635c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1390,6 +1390,109 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>atomic commit</firstterm>
+     (as described in <xref linkend="fdw-transaction-managements"/>), it must call the
+     registrasaction function <function>FdwXactRegisterForeignTransaction</function>
+     and provide the following callback functions:
+    </para>
+
+    <para>
+<programlisting>
+bool
+PrepareForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Prepare a foreign transaction identified by <varname>foreign_xact</varname>.
+    This function is called at the pre-commit phase of the local
+    transaction if atomic commit is
+    required. Returning <literal>true</literal> means that preparing
+    the foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Commit a not-prepared foreign transaction identified by
+    <varname>foreign_xact</varname>.
+    This function is called at the pre-commit phase of local
+    transaction if atomic commit is not required. The atomic
+    commit is not required either when we modified data on
+    only one server including local server or when user doesn't
+    request atomic commit by <xref linkend="guc-foreign-twophase-commit"/>.
+    Returning <literal>true</literal> means that commit the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Rollback a not-prepared foreign transaction identified by
+    <varname>foreign_xact</varname>.
+    This function is called at the end of local transaction after
+    rollbacked locally either when user requested rollback or when
+    any error occurs within the transaction. This function could
+    be called recursively if any error occurs during rollback the
+    foreign transaction for whatever reason. You need to track
+    recursion and prevent this function from being called infinitely.
+    Returning <literal>true</literal> means that rollback the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+ResolvePreparedForeignTransaction(ForeignTransaction *foreign_xact,
+                                  bool is_commit);
+</programlisting>
+    Commit or rollback the prepared foreign transaction identified
+    by <varname>foreign_xact</varname>. on a connection to foreign server
+    When <varname>is_commit</varname> is true, it indicate that the foreign
+    transaction should be committed.
+    This function normally is called by the foreign transaction resolver
+    process but can also be called by <function>pg_resovle_fdw_xacts</function>
+    function. In the resolver process, this function is called either
+    when a backend requests the resolver process to resolve a distributed
+    transaction after prepared or when a database has dangling
+    transaction. Returning <literal>true</literal> means that resolving
+    the foreign transaction got successful.
+    In abort case, please note that the prepared foreign transaction
+    having identifier <varname>foreign__xact->fx_id</varname> might not
+    exist on the foreign server. If you failed to resolve the foreign
+    transaction due to undefined object error
+    (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) you should regards
+    it as success and return <literal>true</literal>.
+    </para>
+    <para>
+<programlisting>
+bool
+IsTwoPhaseCommitEnabled(Oid serverid);
+</programlisting>
+    Return <literal>true</literal> if foreign server identified by
+    <literal>serverid</literal> is capable of two-phase commit protocol.
+    This function is called when the transaction begins to modify data on
+    the foreign server. Return <literal>false</literal> indicates that
+    the current transaction cannot use atomic commit even if atomic commit
+    is requested by user.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>. To get informations of FDW-related
+      objects, you can use given a <literal>ForeignTransaction</literal>
+      instead (see <filename>foreign/fdwxact.h</filename> for details).
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1835,4 +1938,101 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+    <title>Transaction managements for Foreign Data Wrappers</title>
+
+    <sect2 id="fdw-transaction-atomic-commit">
+     <title>Atomic commit among multiple foreign servers</title>
+
+     <para>
+      <productname>PostgreSQL</productname> foreign transaction manager
+      allows FDWs to read and write data on foreign server within a transaction while
+      maintaining atomicity of the foreign data (aka atomic commit). Using
+      atomic commit, it guarantees that a distributed transaction is committed
+      or rollbacked on all participants foreign
+      server.  To achieve atomic commit, <productname>PostgreSQL</productname>
+      employees two-phase commit protocol, which is a type of atomic commitment
+      protocol. Every FDW that wish to support atomic commit
+      is required to support transaction management callback routines
+      (see <xref linkend="fdw-callbacks-transaction-managements"/> for details)
+      and register the foreign transaction using
+      <function>FdwXactRegisterForeignTransaction</function> when starting a
+      transaction on the foreign server. Transaction of registered foreign server
+      is managed by the foreign transaction manager.
+<programlisting>
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id)
+</programlisting>
+    This function should be called when a transaction starts on the foreign server.
+    <varname>serverid</varname> and <varname>userid</varname> are <type>OID</type>s
+    which specify the transaction starts on what server by who. <varname>fx_id</varname>
+    is null-terminated string which is an identifer of foreign transaction and it
+    will be passed when transaction management APIs is called. The length of
+    <varname>fx_id</varname> must be less than 200 bytes. Also this identifier
+    must be unique enough so that it doesn't conflict other concurrent foreign
+    transactions. <varname>fx_id</varname> can be <literal>NULL</literal>.
+    If it's <literal>NULL</literal>, a transaction identifier is automacitally
+    generated with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    Since this identifier is used per foreign transaction and the xid of unresolved
+    distributed transaction never reused, an auto-generated identifier is fairly
+    enough to ensure uniqueness. It's recommended to generate foreign transaction
+    identifier in FDW if the format of auto-generated identifier doesn't match
+    the requirement of the foreign server.
+    </para>
+
+     <para>
+      An example of such transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+     </para>
+
+     <para>
+      When a transaction starts on the foreign server, FDW that wishes atomic
+      commit must register the foreign transaction as a participant by calling
+      <function>FdwXactRegisterForeignTransaction</function>. Also during
+      transaction, <function>IsTwoPhaseCommitEnabled</function> is called whenever
+      the transaction begins to modify data on the foreign server. If FDW wishes
+      atomic commit <function>IsTwoPhaseCommitEnabled</function> must return
+      <literal>true</literal>. All foreign transaction participants must
+      return <literal>true</literal> to achieve atomic commit.
+     </para>
+
+     <para>
+      During pre-commit phase of local transaction, the foreign transaction manager
+      persists the foreign transaction information to the disk and WAL, and then
+      prepare all foreign transaction by calling <function>PrepareForeignTransaction</function>
+      if two-phase commit protocol is required. Two-phase commit is required only if
+      the transaction modified data on more than one servers including the local
+      server and user requests atomic commit. <productname>PostgreSQL</productname>
+      can commit locally and go to the next step if and only if all preparing foreign
+      transactions got successful. If two-phase commit is not required, the foreign
+      transaction manager commits a transaction on the foreign server by calling
+      <function>CommitForeignTransaction</function> and then
+      <productname>PostgreSQL</productname> commits locally. The foreign transaction
+      manager doesn't do any further change on foreign transactions from this point
+      forward. If any failure happens for whatever reason, for example a network
+      failure or user request until <productname>PostgreSQL</productname> commits
+      locally the foreign transaction manager changes over to rollback and calls
+      <function>RollbackForeignTransaction</function> for every foreign servers to
+      close the current transaction on foreign servers.
+     </para>
+
+     <para>
+      When two-phase commit is required, after committed locally, each the transaction
+      commits will wait for all prepared foreign transaction to be resolved before
+      the commit completes. The foreign transaction resolver is responsible for
+      foreign transaction resolution. <function>ResolverForeignTransaction</function>
+      is called by the foreign transaction resolver process when it resolves a foreign
+      transactions. <function>ResolveForeignTransaction</function> is also be called
+      when user execute <function>pg_resovle_fdw_xact</function> function.
+     </para>
+    </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be9..7b7fe1f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20553,6 +20553,57 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-fdw-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_fdw_xacts</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_fdw_xacts</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function search for foreign transaction
+        matching the arguments and resolves then. This function won't resolve
+        a foreign transaction which is in progress, or one that is locked by some
+        other backend.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_fdw_xact</function>
+        except it remove foreign transaction entry without resolving.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa..635a5e7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_fdw_xact_resolver</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-resolver-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1194,6 +1202,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalLauncherMain</literal></entry>
          <entry>Waiting in main loop of logical launcher process.</entry>
         </row>
@@ -1405,6 +1421,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2214,6 +2234,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-resolver-view" xreflabel="pg_stat_fdw_xact_resolver">
+   <title><structname>pg_stat_fdw_xact_resolver</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..3705104
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdw_xactdesc.c
+ *		PostgreSQL distributed transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/backend/access/transam/fdw_xactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "foreign/fdwxact_xlog.h"
+
+void
+fdw_xact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		FdwXactOnDiskData *fdw_insert_xlog = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_insert_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_insert_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_insert_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_insert_xlog->local_xid);
+		/* TODO: This should be really interpreted by each FDW */
+
+		/*
+		 * TODO: we also need to assess whether we want to add this
+		 * information
+		 */
+		appendStringInfo(buf, " foreign transaction info: %s",
+						 fdw_insert_xlog->fdw_xact_id);
+	}
+	else
+	{
+		xl_fdw_xact_remove *fdw_remove_xlog = (xl_fdw_xact_remove *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_remove_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_remove_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_remove_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_remove_xlog->xid);
+	}
+
+}
+
+const char *
+fdw_xact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDW_XACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDW_XACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7..023a7c5 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,14 +112,16 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_prepared_xacts=%d max_locks_per_xact=%d "
 						 "wal_level=%s wal_log_hints=%s "
-						 "track_commit_timestamp=%s",
+						 "track_commit_timestamp=%s "
+						 "max_prepared_foreign_xacts=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..b5c3502 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,6 +24,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact.h"
 #include "replication/message.h"
 #include "replication/origin.h"
 #include "storage/standby.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 306861b..4cc01f1 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -89,6 +89,7 @@
 #include "access/xlogreader.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -843,6 +844,35 @@ TwoPhaseGetGXact(TransactionId xid)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyProc
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2335,6 +2365,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2394,6 +2430,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9aa63c8..d4256f4 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "foreign/fdwxact.h"
 #include "libpq/be-fsstubs.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -1127,6 +1128,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_twophase_for_ac;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1135,6 +1137,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_twophase_for_ac = ForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1173,12 +1176,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_twophase_for_ac)
 			goto cleanup;
 	}
 	else
@@ -1336,6 +1340,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_twophase_for_ac && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -1974,6 +1986,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2129,6 +2144,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2216,6 +2232,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2404,6 +2422,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2608,6 +2627,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 493f1db..67ddfb5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -5246,6 +5247,7 @@ BootStrapXLOG(void)
 	ControlFile->MaxConnections = MaxConnections;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6333,6 +6335,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6853,14 +6858,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdw_xact, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7052,7 +7058,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7558,6 +7567,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7876,6 +7886,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9192,6 +9205,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9625,7 +9639,8 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9657,6 +9672,7 @@ XLogReportParameters(void)
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9854,6 +9870,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10052,6 +10069,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->MaxConnections = xlrec.MaxConnections;
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7251552..5fa6065 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_prepared_fdw_xacts AS
+       SELECT * FROM pg_prepared_fdw_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
 	l.objoid, l.classoid, l.objsubid,
@@ -773,6 +776,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_fdwxact_resolvers AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_fdwxact_resolver() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index 5c53aee..3a6dff5 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -28,6 +28,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
@@ -1093,6 +1094,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1403,6 +1416,16 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srv->serverid,
+						useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d13be41..f60804c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -19,6 +19,7 @@
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -714,7 +715,10 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+		FdwXactMarkForeignTransactionModified(partRelInfo, 0);
+	}
 
 	MemoryContextSwitchTo(oldContext);
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index a2a28b7..30a0b66 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,9 +22,11 @@
  */
 #include "postgres.h"
 
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -224,7 +226,13 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+
+		/* Mark this transaction modified data on the foreign server */
+		FdwXactMarkForeignTransactionModified(estate->es_result_relation_info,
+										 eflags);
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d8d89c7..6554b0b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -44,6 +44,8 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -2317,6 +2319,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 fdw_private,
 															 i,
 															 eflags);
+
+			/* Mark this transaction modified data on the foreign server */
+			FdwXactMarkForeignTransactionModified(resultRelInfo, eflags);
 		}
 
 		resultRelInfo++;
diff --git a/src/backend/foreign/Makefile b/src/backend/foreign/Makefile
index 85aa857..4329d3e 100644
--- a/src/backend/foreign/Makefile
+++ b/src/backend/foreign/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/foreign
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS= foreign.o
+OBJS= foreign.o fdwxact.o fdwxact_launcher.o fdwxact_resolver.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/foreign/fdwxact.c b/src/backend/foreign/fdwxact.c
new file mode 100755
index 0000000..d284861
--- /dev/null
+++ b/src/backend/foreign/fdwxact.c
@@ -0,0 +1,2762 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL distributed transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * When a foreign data wrapper starts transaction on a foreign server
+ * that is capable of two-phase commit protocol, it's required to register
+ * the foreign transaction using function FdwXactRegisterTransaction() in order
+ * to participate to a group for atomic commit. Participants are identified
+ * by oid of foreign server and user. When the foreign transaction begins
+ * to modify data it's required to mark it as modified using
+ * FdwXactMarkForeignTransactionModified()
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. After committing or rolling back locally, we
+ * notify the resolver process and tell it to commit or roll back those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish,
+ * and so that we're not trying to locally do work that might fail when an
+ * ERROR after already committed.
+ *
+ * Two-phase commit protocol is required if the transaction modified
+ * two or more servers including itself. In other case, all foreign transactions
+ * are committed during pre-commit.
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. dangling
+ * transaction). Dangling transactions are processed by the resolve process
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * 	* On PREPARE redo we add the foreign transaction to FdwXactCtl->fdw_xacts.
+ *	  We set fdw_xact->inredo to true for such entries.
+ *	* On Checkpoint redo, we iterate through FdwXactCtl->fdw_xacts entries that
+ *	  have set fdw_xact->inredo true and are behind the redo_horizon. We save
+ *    them to disk and then set fdw_xact->ondisk to true.
+ *	* On COMMIT and ABORT we delete the entry from FdwXactCtl->fdw_xacts.
+ *	  If fdw_xact->ondisk is true, we delete the corresponding file from
+ *	  the disk as well.
+ *  * RecoverFdwXacts loads all foreign transaction entries from disk into
+ *    memory at server startup.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/foreign/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/fdwxact_xlog.h"
+#include "foreign/resolver_internal.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Is atomic commit requested by user? */
+#define AtomicCommitRequested() \
+	(foreign_twophase_commit == true && \
+	 max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Structure to bundle the foreign transaction participant */
+typedef struct FdwXactParticipant
+{
+	Oid			serverid;
+	Oid			userid;
+
+	/*
+	 * Pointer to a FdwXact entry in global entry. NULL if
+	 * this foreign transaction is registered but not inserted
+	 * yet.
+	 */
+	FdwXact		fdw_xact;
+	char		*fdw_xact_id;
+
+	/* true if this transaction modified data on the foreign server */
+	bool		modified;
+
+	/*
+	 * This is initialized at foreign transaction registration and
+	 * passed to API functions.
+	 */
+	ForeignTransaction foreign_xact;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact;
+	CommitForeignTransaction_function	commit_foreign_xact;
+	RollbackForeignTransaction_function	rollback_foreign_xact;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit.
+ * This list has only foreign servers that are capable of two-phase
+ * commit protocol.
+ */
+List *FdwXactParticipantsForAC = NIL;
+
+/*
+ * This struct tracks all participants involved with transaction 'xid'.
+ */
+typedef struct FdwXactStateCacheEntry
+{
+	/* Key -- must be first */
+	TransactionId	xid;
+
+	/* List of FdwXacts involved with the xid */
+	FdwXact	participants;
+} FdwXactStateCacheEntry;
+static HTAB	*FdwXactStateCache;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDW_XACTS_DIR "pg_fdw_xact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDW_XACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDW_XACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+static FdwXact FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static bool FdwXactResolveForeignTransaction(FdwXact fdw_xact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactQueueInsert(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid, bool give_warnings);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool give_warnings);
+static List *get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						   bool need_lock);
+static FdwXact get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								bool need_lock);
+static FdwXact get_all_fdw_xacts(int *length);
+static FdwXact insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							   char *fdw_xact_id);
+static char *generate_fdw_xact_identifier(Oid serverid, Oid userid);
+static void remove_fdw_xact(FdwXact fdw_xact);
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+bool foreign_twophase_commit = false;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * This function aimes to be called by FDW when foreign transaction
+ * starts. The foreign server identified by given server id must
+ * support atomic commit APIs. The foreign transaction is identified
+ * by given identifier 'fdw_xact_id' which can be NULL. If it's NULL,
+ * we construct an unique identifer.
+ *
+ * After registered, foreign transaction of participants are managed
+ * by foreign transaction manager until the end of the distributed
+ * transaction.
+ */
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id)
+{
+	FdwXactParticipant	*fdw_part;
+	ListCell   			*lc;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	MemoryContext		old_context;
+
+	/* Check length of foreign transaction identifier */
+	if (fx_id != NULL && strlen(fx_id) >= NAMEDATALEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						fx_id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   NAMEDATALEN)));
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_xact_resolvers to a nonzero value.")));
+
+	/* Duplication check */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		/* Quick return if there is already registered connection */
+		if (fdw_part->serverid == serverid && fdw_part->userid == userid)
+			ereport(ERROR,
+					(errmsg("attempt to start transction again on server %u user %u",
+							serverid, userid)));
+	}
+
+	/*
+	 * Participants information is needed at the end of a transaction, when
+	 * system cache are not available. so save it in TopTransactionContext
+	 * before hand so that these can live until the end of transaction.
+	 */
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	/* Make sure that the FDW has transaction handlers */
+	if (!fdw_routine->PrepareForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function provided for preparing foreign transaction for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to commit a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->RollbackForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to rollback a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+
+	/* Generate foreign transaction identifier if not provided */
+	if (fx_id ==  NULL)
+		fx_id = generate_fdw_xact_identifier(serverid, userid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->serverid = serverid;
+	fdw_part->userid = userid;
+	fdw_part->fdw_xact_id = fx_id;
+	fdw_part->fdw_xact = NULL;
+	fdw_part->modified = false;	/* by default */
+	fdw_part->foreign_xact.server = foreign_server;
+	fdw_part->foreign_xact.usermapping = user_mapping;
+	fdw_part->foreign_xact.fx_id = fx_id;
+	fdw_part->prepare_foreign_xact = fdw_routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact = fdw_routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact = fdw_routine->RollbackForeignTransaction;
+
+	/* Add this foreign connection to the participants list */
+	FdwXactParticipantsForAC = lappend(FdwXactParticipantsForAC, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_context);
+
+	return;
+}
+
+/*
+ * Remember the registered foreign transaction modified data . This function
+ * is called when the executor begins to modify data on a foreign server
+ * regardless the foreign server is capable of two-phase commit protocol.
+ * Marking it will be used to determine we must use two-phase commit protocol
+ * at commit. This function also checks if the begin modified foreign server
+ * is capable of two-phase commit or not. If it doesn't support, we remember
+ * it.
+ */
+void
+FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo, int flags)
+{
+	Relation			rel = resultRelInfo->ri_RelationDesc;
+	FdwXactParticipant	*fdw_part;
+	ForeignTable		*ftable;
+	ListCell   			*lc;
+	Oid					userid;
+	Oid					serverid;
+
+	bool found = false;
+
+	/* Quick return if user not request */
+	if (!AtomicCommitRequested())
+		return;
+
+	/* Do nothing in EXPLAIN (no ANALYZE) case */
+	if (flags && EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	ftable = GetForeignTable(RelationGetRelid(rel));
+
+	/*
+	 * If the being modified foreign server doesn't or cannot enable
+	 * two-phase commit protocol, mark that we've written such server
+	 * and return.
+	 */
+	if (resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled == NULL ||
+		!resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled(ftable->serverid))
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		return;
+	}
+
+	/*
+	 * The foreign server being modified supports two-phase commit protocol,
+	 * remember that the foreign transaction modified data.
+	 */
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	serverid = ftable->serverid;
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		if (fdw_part->serverid == serverid && fdw_part->userid == userid)
+		{
+			fdw_part->modified = true;
+			found = true;
+			break;
+		}
+	}
+
+	if (!found)
+		elog(ERROR, "attempt to mark unregistered foreign server %u, user %u as modified",
+			 serverid, userid);
+}
+
+/*
+ * FdwXactShmemSize
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdw_xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	/* Size for shared cache entry */
+	size = MAXALIGN(size);
+	size = add_size(size, hash_estimate_size(max_prepared_foreign_xacts,
+											 sizeof(FdwXactStateCacheEntry)));
+
+	return size;
+}
+
+/*
+ * FdwXactShmemInit
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of
+ * FdwXactCtlData structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdw_xacts;
+		HASHCTL		info;
+		long		max_hash_size;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->freeFdwXacts = NULL;
+		FdwXactCtl->numFdwXacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdw_xacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdw_xacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdw_xacts[cnt].status = FDW_XACT_INITIAL;
+			fdw_xacts[cnt].fxact_free_next = FdwXactCtl->freeFdwXacts;
+			FdwXactCtl->freeFdwXacts = &fdw_xacts[cnt];
+		}
+
+		/* Initialize shared state cache hash table */
+		MemSet(&info, 0, sizeof(info));
+		info.keysize = sizeof(TransactionId);
+		info.entrysize = sizeof(FdwXactStateCacheEntry);
+		max_hash_size = max_prepared_foreign_xacts;
+
+		FdwXactStateCache = ShmemInitHash("FdwXact hash",
+										  max_hash_size,
+										  max_hash_size,
+										  &info,
+										  HASH_ELEM | HASH_BLOBS);
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * PreCommit_FdwXacts
+ *
+ * This function prepares all foreign transaction participants if atomic commit
+ * is required. Otherwise commits them without preparing.
+ *
+ * If atomic commit is requested by user (that is, foreign_twophase_commit is on),
+ * every participants must enable two-phase commit. If we manage all foreign
+ * transactions involving with a transaction we can commit foreign transactions
+ * on foreign server that doesn't use two-phase commit here and commit others
+ * at post-commit phase, but we don't do that. Because (1) it doesn't satisfy
+ * the atomic commit semantics at all and (2) it requires all FDWs to register
+ * foreign server anyway, which breaks backward compatibility.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * If user requires the atomic commit semantics, we don't allow COMMIT if we've
+	 * modified data on  foreign servers both that can execute two-phase commit
+	 * protocol and that cannot.
+	 */
+	if (foreign_twophase_commit == true && MyXactFlags & XACT_FLAGS_FDWNOPREPARE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	if (ForeignTwophaseCommitRequired())
+	{
+		/* Prepare the transactions on the all foreign servers */
+		FdwXactPrepareForeignTransactions();
+	}
+	else
+	{
+		ListCell   *lc;
+
+		Assert(list_length(FdwXactParticipantsForAC) == 1);
+
+		/* Two-phase commit is not required, commit them one by one */
+		foreach(lc, FdwXactParticipantsForAC)
+		{
+			FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			/* Commit foreign transaction */
+			if (!fdw_part->commit_foreign_xact(&fdw_part->foreign_xact))
+				ereport(ERROR,
+						(errmsg("could not commit foreign transaction on server %s",
+								fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* Forget all participants */
+		FdwXactParticipantsForAC = NIL;
+	}
+}
+
+/*
+ * FdwXactPrepareForeignTransactions
+ *
+ * Prepare all foreign transaction participants.  This function creates a prepared
+ * participants chain whenever we prepared a foreign transaction. The prepared
+ * participants chain is used to access all participants of distributed transaction
+ * quickly. If any one of them fails to prepare or raises an error, we change over
+ * to aborts.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell   *lcell;
+	FdwXact		prev_fxact = NULL;
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXact		fxact;
+
+		/*
+		 * Register the foreign transaction entry. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fxact = FdwXactRegisterFdwXactEntry(GetTopTransactionId(), fdw_part);
+
+		/*
+		 * Between FdwXactRegisterFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal). During abort processing,
+		 * we might try to resolve a never-prepared transaction, and get an error.
+		 * This is fine as long as the FDW provides us unique prepared transaction
+		 * identifiers.
+		 */
+		if (!fdw_part->prepare_foreign_xact(&fdw_part->foreign_xact))
+		{
+			/* Failed to prepare, change over aborts */
+			ereport(ERROR,
+					(errmsg("could not prepare transaction on foreign server %s",
+							fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* Preparation is success, update its status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdw_part->fdw_xact->status = FDW_XACT_PREPARED;
+		fdw_part->fdw_xact = fxact;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Create a prepared participants chain, which is link-ed FdwXact entries
+		 * involving with this transaction. The head entry is remembered in hash
+		 * table and subsequent entries is liked from the previous entry.
+		 */
+		if (!prev_fxact)
+		{
+			FdwXactStateCacheEntry	*fxact_entry;
+			bool				found;
+
+			LWLockAcquire(FdwXactLock,LW_EXCLUSIVE);
+			fxact_entry = (FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+																 (void *) &(fxact->local_xid),
+																 HASH_ENTER, &found);
+			LWLockRelease(FdwXactLock);
+			Assert(!found);
+
+			/* Set the first participant */
+			fxact_entry->participants = fxact;
+		}
+		else
+		{
+			/* Append others to the tail */
+			Assert(fxact->fxact_next == NULL);
+			prev_fxact->fxact_next = fxact;
+		}
+
+		prev_fxact = fxact;
+	}
+}
+
+/*
+ * FdwXactRegisterFdwXactEntry
+ *
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and will
+ * be persisted to the disk under pg_fdw_xact directory when checkpoint.
+ */
+static FdwXact
+FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fxact;
+	FdwXactOnDiskData	*fxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact = insert_fdw_xact(MyDatabaseId, xid, fdw_part->serverid,
+							fdw_part->userid, fdw_part->fdw_xact_id);
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->registered_backend = MyBackendId;
+	fdw_part->fdw_xact = fxact;
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdw_xact_id);
+	data_len = data_len + strlen(fdw_part->fdw_xact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fxact_file_data->dbid = MyDatabaseId;
+	fxact_file_data->local_xid = xid;
+	fxact_file_data->serverid = fdw_part->serverid;
+	fxact_file_data->userid = fdw_part->userid;
+	memcpy(fxact_file_data->fdw_xact_id, fdw_part->fdw_xact_id,
+		   strlen(fdw_part->fdw_xact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fxact_file_data, data_len);
+	fxact->insert_end_lsn = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_INSERT);
+	XLogFlush(fxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact->valid = true;
+	LWLockRelease(FdwXactLock);
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fxact_file_data);
+	return fxact;
+}
+
+/*
+ * insert_fdw_xact
+ *
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				char *fdw_xact_id)
+{
+	int i;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		fxact = FdwXactCtl->fdw_xacts[i];
+		if (fxact->dbid == dbid &&
+			fxact->local_xid == xid &&
+			fxact->serverid == serverid &&
+			fxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->freeFdwXacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fxact = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fxact->fxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->numFdwXacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts++] = fxact;
+
+	fxact->registered_backend = InvalidBackendId;
+	fxact->dbid = dbid;
+	fxact->local_xid = xid;
+	fxact->serverid = serverid;
+	fxact->userid = userid;
+	fxact->insert_start_lsn = InvalidXLogRecPtr;
+	fxact->insert_end_lsn = InvalidXLogRecPtr;
+	fxact->valid = false;
+	fxact->ondisk = false;
+	fxact->inredo = false;
+	memcpy(fxact->fdw_xact_id, fdw_xact_id, strlen(fdw_xact_id) + 1);
+
+	return fxact;
+}
+
+/*
+ * remove_fdw_xact
+ *
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdw_xact(FdwXact fdw_xact)
+{
+	int			cnt;
+
+	Assert(fdw_xact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		if (FdwXactCtl->fdw_xacts[cnt] == fdw_xact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (cnt >= FdwXactCtl->numFdwXacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdw_xact->local_xid, fdw_xact->serverid, fdw_xact->userid)));
+
+	/* Remove the entry from active array */
+	FdwXactCtl->numFdwXacts--;
+	FdwXactCtl->fdw_xacts[cnt] = FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts];
+
+	/* Put it back into free list */
+	fdw_xact->fxact_free_next = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fdw_xact;
+
+	/* Reset informations */
+	fdw_xact->status = FDW_XACT_INITIAL;
+	fdw_xact->registered_backend = InvalidBackendId;
+	fdw_xact->fxact_next = NULL;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdw_xact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdw_xact->serverid;
+		record.dbid = fdw_xact->dbid;
+		record.xid = fdw_xact->local_xid;
+		record.userid = fdw_xact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdw_xact_remove));
+		recptr = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true if the current transaction requires foreign two-phase commit
+ * to achieve atomic commit. Foreign two-phase commit is required if we
+ * satisfy either case: we modified data on two or more foreign server, or
+ * we modified both non-temporary relation on local and data on more than
+ * one foreign server.
+ */
+bool
+ForeignTwophaseCommitRequired(void)
+{
+	int	nserverswritten = list_length(FdwXactParticipantsForAC);
+	ListCell*	lc;
+	bool		modified = false;
+
+	/* Return if not requested */
+	if (!AtomicCommitRequested())
+		return false;
+
+	/* Check if we modified data on any foreign server */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+		{
+			modified = true;
+			break;
+		}
+	}
+
+	/* We didn't modify data on any foreign server */
+	if (!modified)
+		return false;
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	return nserverswritten > 1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * ForgetAllFdwXactParticipants
+ *
+ * Reset all the foreign transaction entries that this backend registered.
+ * If the foreign transaction has the corresponding FdwXact entry, resetting
+ * the registered_backend field means to leave that entry in unresolved state.
+ * If we leaves any entries, we update the oldest xmin of unresolved transaction
+ * so that transaction status of dangling transaction are not truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_left = 0;
+
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(cell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+
+		/* Skip if didn't register FdwXact entry yet */
+		if (fdw_part->fdw_xact == NULL)
+			continue;
+
+		/*
+		 * There is a race condition; the entries of FdwXactParticipantsForAC
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget them. So we need to check if the entries are
+		 * still associated with the transaction.
+		 */
+		if (fdw_part->fdw_xact->registered_backend == MyBackendId)
+		{
+			fdw_part->fdw_xact->registered_backend = InvalidBackendId;
+			n_left++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Update the oldest local transaction of unresolved distributed
+	 * transaction if we leaved any FdwXact entries.
+	 */
+	if (n_left > 0)
+		FdwXactComputeRequiredXmin();
+
+	FdwXactParticipantsForAC = NIL;
+}
+
+/*
+ * AtProcExit_FdwXact
+ *
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDW_XACT_NOT_WAITING and then change
+ * that state to FDW_XACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransactions a fdwxact resolver changes the
+ * state to FDW_XACT_WAIT_COMPLETE once foreign transactions are resolved.
+ * This backend then resets its state to FDW_XACT_NOT_WAITING.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+	ListCell	*lc;
+	List		*fdwxact_participants = NIL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!AtomicCommitRequested())
+		return;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDW_XACT_NOT_WAITING);
+
+	if (FdwXactParticipantsForAC != NIL)
+	{
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * we've prepared just before, use the participants list.
+		 */
+		Assert(MyPgXact->xid == wait_xid);
+		fdwxact_participants = FdwXactParticipantsForAC;
+	}
+	else
+	{
+		FdwXactStateCacheEntry *fdwxact_entry;
+		bool found;
+
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * is part of a local prepared transaction that is mark as
+		 * prepared during running, since these entries exist in the hash
+		 * table we construct the participants list from the entry.
+		 */
+		Assert(FdwXactStateCache);
+		fdwxact_entry = (FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+															   (void *) &wait_xid,
+															   HASH_FIND, &found);
+
+		if (found)
+		{
+			FdwXact	fdwxact;
+
+			for (fdwxact = fdwxact_entry->participants;
+				 fdwxact != NULL;
+				 fdwxact = fdwxact->fxact_next)
+				fdwxact_participants = lappend(fdwxact_participants, fdwxact);
+		}
+	}
+
+	/*
+	 * Otherwise, construct the participants list by scanning the global
+	 * array. This can happen in the case where we restarts after PREPARE'd
+	 * a distributed transaction and then are trying to resolve it.
+	 */
+	if (fdwxact_participants == NIL)
+		fdwxact_participants = get_fdw_xacts(MyDatabaseId, wait_xid,
+											 InvalidOid, InvalidOid, true);
+
+	/* Exit if we found no foreign transaction to resolve */
+	if (fdwxact_participants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(lc, fdwxact_participants)
+	{
+		FdwXact fdw_xact = (FdwXact) lfirst(lc);
+
+		/* Don't overwrite status if fate has been determined */
+		if (fdw_xact->status == FDW_XACT_PREPARED)
+			fdw_xact->status = (is_commit ?
+								FDW_XACT_COMMITTING_PREPARED :
+								FDW_XACT_ABORTING_PREPARED);
+	}
+
+	/* Set backend status and enqueue itself */
+	MyProc->fdwXactState = FDW_XACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	FdwXactQueueInsert();
+	LWLockRelease(FdwXactLock);
+
+	/* Launch a resolver process if not yet, or wake it up */
+	fdwxact_maybe_launch_resolver(false);
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDW_XACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDW_XACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDW_XACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+
+	/*
+	 * Forget the list of locked entries, also means that the entries
+	 * that could not resolved are remained as dangling transactions.
+	 */
+	ForgetAllFdwXactParticipants();
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Acquire FdwXactLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Insert MyProc into the tail of FdwXactQueue.
+ */
+static void
+FdwXactQueueInsert(void)
+{
+	SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactQueue),
+						 &(MyProc->fdwXactLinks));
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Create and initialize an FdwXactResolveState which is used
+ * for resolution of foreign transactions.
+ */
+FdwXactResolveState *
+CreateFdwXactResolveState(void)
+{
+	FdwXactResolveState *frstate = palloc0(sizeof(FdwXactResolveState));
+
+	frstate->dbid = MyDatabaseId;
+	frstate->fdwxact = NULL;
+	frstate->waiter = NULL;
+
+	return frstate;
+}
+
+/*
+ * Resolve one distributed transaction. The target distributed transaction
+ * is fetched from shmem queue and its participants are fetched from either
+ * shmem hash table or global array. Release the waiter and return true only
+ * if we resolved the all of the foreign transaction participants. Return
+ * false if we flied to resolve any of them.
+ *
+ * To ensure the order of registered distributed transaction to the queue, we
+ * must not go the next distributed transaction until all of participants are
+ * resolved. The failed foreign transactions will be retried at the next execution.
+ */
+bool
+FdwXactResolveDistributedTransaction(FdwXactResolveState *frstate)
+{
+	FdwXactStateCacheEntry	*fdwxact_entry = NULL;
+	volatile FdwXact	fdwxacts_failed_to_resolve = NULL;
+	bool				all_resolved = false;
+
+	Assert(frstate->dbid == MyDatabaseId);
+
+	/* Get a new waiter, if not exists */
+	if (frstate->waiter == NULL)
+	{
+		PGPROC	*proc;
+
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+		/* Fetch a waiter from beginning of the queue */
+		while ((proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->FdwXactQueue),
+											   &(FdwXactRslvCtl->FdwXactQueue),
+											   offsetof(PGPROC, fdwXactLinks))) != NULL)
+		{
+			/* Found a waiter */
+			if (proc->databaseId == frstate->dbid)
+				break;
+		}
+
+		LWLockRelease(FdwXactLock);
+
+		/* If no waiter, there is no job */
+		if (!proc)
+			return false;
+
+		Assert(TransactionIdIsValid(proc->fdwXactWaitXid));
+		frstate->waiter = proc;
+	}
+
+	/* Get foreign transaction participants */
+	if (frstate->fdwxact == NULL)
+	{
+		bool found;
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+		/* Search FdwXact entries from the hash table by the local transaction id */
+		fdwxact_entry =
+			(FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+												   (void *) &(frstate->waiter->fdwXactWaitXid),
+												   HASH_FIND, &found);
+
+		if (found)
+			frstate->fdwxact = fdwxact_entry->participants;
+		else
+		{
+			int i;
+			FdwXact entries_to_resolve = NULL;
+			FdwXact prev_fx = NULL;
+
+			/*
+			 * The fdwxact entry doesn't exist in the hash table in case where
+			 * a prepared transaction is resolved after recovery. In this case,
+			 * we construct a list of fdw xact entries by scanning over the
+			 * FdwXactCtl->fdw_xacts list.
+			 */
+			for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+			{
+				FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+
+				if (fdw_xact->dbid == frstate->dbid &&
+					fdw_xact->local_xid == frstate->waiter->fdwXactWaitXid)
+				{
+					if (!entries_to_resolve)
+						entries_to_resolve = fdw_xact;
+
+					/* Link from previous entry to this entry */
+					if (prev_fx)
+						prev_fx->fxact_next = fdw_xact;
+
+					prev_fx = fdw_xact;
+				}
+			}
+
+			frstate->fdwxact = entries_to_resolve;
+		}
+
+		LWLockRelease(FdwXactLock);
+	}
+
+	Assert(frstate->fdwxact != NULL);
+
+	/* Resolve all foreign transactions one by one */
+	while (frstate->fdwxact != NULL)
+	{
+		volatile FdwXact cur_fdwxact = frstate->fdwxact;
+		volatile FdwXact fdwxact_next = NULL;
+
+		/*
+		 * Remember the next FdwXact entry to resolve as the current entry will
+		 * be removed after resolved from the list.
+		 */
+		fdwxact_next = cur_fdwxact->fxact_next;
+
+		/* Resolve a foreign transaction */
+		if (!FdwXactResolveForeignTransaction(cur_fdwxact))
+		{
+			ForeignServer *fserver;
+
+			CHECK_FOR_INTERRUPTS();
+
+			/* Failed to resolve. Remember it for the next execution */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (fdwxacts_failed_to_resolve == NULL)
+			{
+				/*
+				 * For the first failed entry, reset its next pointer
+				 * and append it to the head of list.
+				 */
+				cur_fdwxact->fxact_next = NULL;
+				fdwxacts_failed_to_resolve = cur_fdwxact;
+			}
+			else
+			{
+				FdwXact fx = fdwxacts_failed_to_resolve;
+
+				/* Append the entry at the tail */
+				while (fx->fxact_next != NULL)
+					fx = fx->fxact_next;
+				fx->fxact_next = cur_fdwxact;
+			}
+			LWLockRelease(FdwXactLock);
+
+			fserver = GetForeignServer(cur_fdwxact->serverid);
+			ereport(LOG,
+					(errmsg("could not resolve a foreign transaction on server \"%s\"",
+							fserver->servername),
+					 errdetail("local transaction id is %u, connected by user id %u",
+							   cur_fdwxact->local_xid, cur_fdwxact->userid)));
+		}
+		else
+		{
+			/* Resolved. Update the cache entry if it's valid */
+			if (fdwxact_entry)
+				fdwxact_entry->participants = fdwxact_next;
+
+			elog(DEBUG2, "resolved a foreign transaction xid %u, serverid %d, userid %d",
+				 cur_fdwxact->local_xid, cur_fdwxact->serverid, cur_fdwxact->userid);
+		}
+
+		/* Advance the resolution status to the next */
+		frstate->fdwxact = fdwxact_next;
+	}
+
+	all_resolved = (fdwxacts_failed_to_resolve == NULL);
+
+	if (all_resolved)
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+		/* Remove the state cache entry from shmem hash table */
+		hash_search(FdwXactStateCache, (void *) &(frstate->waiter->fdwXactWaitXid),
+					HASH_REMOVE, NULL);
+
+		/*
+		 * Remove waiter from shmem queue, if not detached yet. The waiter
+		 * could already be detached if user cancelled to wait before
+		 * resolution.
+		 */
+		if (!SHMQueueIsDetached(&(frstate->waiter->fdwXactLinks)))
+		{
+			TransactionId	wait_xid = frstate->waiter->fdwXactWaitXid;
+
+			SHMQueueDelete(&(frstate->waiter->fdwXactLinks));
+
+			pg_write_barrier();
+
+			/* Set state to complete */
+			frstate->waiter->fdwXactState = FDW_XACT_WAIT_COMPLETE;
+
+			/* Wake up the waiter only when we have set state and removed from queue */
+			SetLatch(&(frstate->waiter->procLatch));
+
+			elog(DEBUG2, "released a proc xid %u", wait_xid);
+		}
+
+		LWLockRelease(FdwXactLock);
+
+		/* Reset resolution state */
+		frstate->waiter = NULL;
+		Assert(frstate->fdwxact == NULL);
+	}
+	else
+	{
+		/*
+		 * Update the fdwxact entry we're processing so that the failed
+		 * fdwxact entries will be processed again.
+		 */
+		frstate->fdwxact = fdwxacts_failed_to_resolve;
+	}
+
+	return all_resolved;
+}
+
+/*
+ * Resolve all dangling foreign transactions on the given database. Get
+ * all dangling foreign transactions from shmem global array and resolve
+ * them one by one.
+ *
+ * Unlike FdwXactResolveDistributedTransaction, for dangling transaction
+ * resolution, we don't bother the order of resolution because these entries
+ * already got out of order. So if failed to resolve a foreign transaction,
+ * we can go to the next foreign transaction that might associates with
+ * an another distributed transaction.
+ */
+void
+FdwXactResolveAllDanglingTransactions(Oid dbid)
+{
+	List		*dangling_fdwxacts = NIL;
+	ListCell	*cell;
+	bool		n_resolved = 0;
+	int			i;
+
+	Assert(OidIsValid(dbid));
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/*
+	 * Walk over the global array to make the list of dangling transactions
+	 * of which corresponding local transaction is on the given database.
+	 */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fxact = FdwXactCtl->fdw_xacts[i];
+
+		/*
+		 * Append the fdwxact entry on the given database to the list if
+		 * it's handled by nobody and the corresponding local transaction
+		 * is not part of the prepared transaction.
+		 */
+		if (fxact->dbid == dbid &&
+			fxact->registered_backend == InvalidBackendId &&
+			!TwoPhaseExists(fxact->local_xid))
+			dangling_fdwxacts = lappend(dangling_fdwxacts, fxact);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/* Return if there is no foreign transaction we need to resolve */
+	if (dangling_fdwxacts == NIL)
+		return;
+
+	foreach(cell, dangling_fdwxacts)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(cell);
+
+		if (!FdwXactResolveForeignTransaction(fdwxact))
+		{
+			ForeignServer *fserver = GetForeignServer(fdwxact->serverid);
+
+			/*
+			 * If failed to resolve this foreign transaction we skip it in
+			 * this resolution cycle. Try to resolve again in next cycle.
+			 */
+			ereport(LOG,
+					(errmsg("could not resolve a dangling foreign transaction on server \"%s\"",
+							fserver->servername),
+					 errdetail("local transaction id is %u, connected by user id %u",
+							   fdwxact->local_xid, fdwxact->userid)));
+			continue;
+		}
+
+		n_resolved++;
+	}
+
+	list_free(dangling_fdwxacts);
+
+	elog(DEBUG2, "resolved %d dangling foreign xacts", n_resolved);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ *
+ * In commit case, we have already prepared transactions on the foreign
+ * servers during pre-commit. And that prepared transactions will be
+ * resolved by the resolver process. So we don't do anything about the
+ * foreign transaction.
+ *
+ * In abort case, user requested rollback or we changed over rollback
+ * due to error during commit. To close current foreign transaction anyway
+ * we call rollback API to every foreign transaction. If we raised an error
+ * during preparing and came to here, it's possible that some entries of
+ * FdwXactParticipants already registered its FdwXact entry. If there is
+ * we leave them as dangling transaction and ask the resolver process to
+ * process them.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		int left_fdwxacts = 0;
+
+		foreach (lcell, FdwXactParticipantsForAC)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * Count FdwXact entries that we registered to shared memory array
+			 * in this transaction.
+			 */
+			if (fdw_part->fdw_xact)
+			{
+				/*
+				 * The status of foreign transaction must be either preparing
+				 * or prepared. In any case, since we have registered FdwXact
+				 * entry we leave them to the resolver process. For the preparing
+				 * state, since the foreign transaction might not close yet we
+				 * fall through and call rollback API. For the prepared state,
+				 * since the foreign transaction has closed we don't need to do
+				 * anything.
+				 */
+				Assert(fdw_part->fdw_xact->status == FDW_XACT_PREPARING ||
+					   fdw_part->fdw_xact->status == FDW_XACT_PREPARED);
+
+				left_fdwxacts++;
+				if (fdw_part->fdw_xact->status == FDW_XACT_PREPARED)
+					continue;
+			}
+
+			/*
+			 * Rollback all current foreign transaction. Since we're rollbacking
+			 * the transaction it's too late even if we raise an error here.
+			 * So we log it as warning.
+			 */
+			if (!fdw_part->rollback_foreign_xact(&fdw_part->foreign_xact))
+				ereport(WARNING,
+						(errmsg("could not abort transaction on server \"%s\"",
+								fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* If we left some FdwXact entries, ask the resolver process */
+		if (left_fdwxacts > 0)
+		{
+			ereport(WARNING,
+					(errmsg("left %u foreign transactions in in-doubt status",
+							left_fdwxacts)));
+			fdwxact_maybe_launch_resolver(true);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * AtPrepare_FdwXacts
+ *
+ * If there are foreign servers involved in the transaction, this function
+ * prepares transactions on those servers.
+ *
+ * Note that it can happen that the transaction aborts after we prepared part
+ * of participants. In this case since we can change to abort we cannot forget
+ * FdwXactParticipantsForAC here. These are processed by the resolver process
+ * during aborting, or at EOXact_FdwXacts.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * We cannot prepare distributed transaction if any foreign server of
+	 * participants in the transaction isn't capable of two-phase commit.
+	 */
+	if ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION),
+				 errmsg("can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * FdwXactResolveForeignTransaction
+ *
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * handler routine. The foreign transaction can be a dangling transaction
+ * that is not interested by nobody. If the fate of foreign transaction is
+ * not determined yet, it'sdetermined according to the status of corresponding
+ * local transaction.
+ *
+ * If the resolution is successful, remove the foreign transaction entry from
+ * the shared memory and also remove the corresponding on-disk file.
+ */
+static bool
+FdwXactResolveForeignTransaction(FdwXact fdwxact)
+{
+	bool		resolved;
+	bool		is_commit;
+	ForeignServer		*fserver;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	ForeignTransaction	foreign_xact;
+
+	Assert(fdwxact);
+
+	/*
+	 * Determine whether we commit or abort this foreign transaction.
+	 */
+	if (fdwxact->status == FDW_XACT_COMMITTING_PREPARED)
+		is_commit = true;
+	else if (fdwxact->status == FDW_XACT_ABORTING_PREPARED)
+		is_commit = false;
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	else if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_COMMITTING_PREPARED;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_ABORTING_PREPARED;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		is_commit = false;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction
+	 * state is neither committing or aborting. This should not
+	 * happen because we cannot determine to do commit or abort for
+	 * foreign transaction associated with the in-progress local
+	 * transaction.
+	 */
+	else
+		ereport(ERROR,
+				(errmsg("cannot resolve foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid)));
+
+	/* Construct foreign server connection information for passing to API */
+	fserver = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(fserver->fdwid);
+	user_mapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	foreign_xact.server = fserver;
+	foreign_xact.usermapping = user_mapping;
+	foreign_xact.fx_id = fdwxact->fdw_xact_id;
+
+	/* Resolve the foreign transaction */
+	Assert(fdw_routine->ResolveForeignTransaction);
+	resolved = fdw_routine->ResolveForeignTransaction(&foreign_xact,
+													  is_commit);
+
+	if (!resolved)
+	{
+		ForeignServer *fserver = GetForeignServer(fdwxact->serverid);
+		ereport(ERROR,
+				(errmsg("could not %s a prepared foreign transaction on server \"%s\"",
+						is_commit ? "commit" : "rollback", fserver->servername),
+				 errdetail("local transaction id is %u, connected by user id %u",
+						   fdwxact->local_xid, fdwxact->userid)));
+	}
+	else
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdw_xact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	return resolved;
+}
+
+/*
+ * Return one FdwXact entry that matches to given arguments, otherwise
+ * return NULL. Since this function search FdwXact entry by unique key
+ * all arguments should be valid.
+ */
+static FdwXact
+get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				 bool need_lock)
+{
+	List	*fdw_xact_list;
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, need_lock);
+
+	/* Could not find entry */
+	if (fdw_xact_list == NIL)
+		return NULL;
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdw_xact_list) == 1);
+
+	return (FdwXact) linitial(fdw_xact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdw_xact_exists(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdw_xact_list;
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, true);
+
+	return fdw_xact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_prepared_fdw_xacts.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdw_xacts(int *length)
+{
+	List		*all_fdw_xacts;
+	ListCell	*lc;
+	FdwXact		fdw_xacts;
+	int			num_fdw_xacts = 0;
+
+	Assert(length != NULL);
+
+	/* Get all entries */
+	all_fdw_xacts = get_fdw_xacts(InvalidOid, InvalidTransactionId,
+								  InvalidOid, InvalidOid, true);
+
+	if (all_fdw_xacts == NIL)
+	{
+		*length = 0;
+		return NULL;
+	}
+
+	fdw_xacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdw_xacts));
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdw_xacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdw_xacts + num_fdw_xacts, fx,
+			   sizeof(FdwXactData));
+		num_fdw_xacts++;
+	}
+
+	*length = num_fdw_xacts;
+	list_free(all_fdw_xacts);
+
+	return fdw_xacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return
+ * NIL.
+ */
+static List*
+get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			  bool need_lock)
+{
+	int i;
+	List	*fdw_xact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact	fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool	matches = true;
+
+		/* xid */
+		if (xid != InvalidTransactionId && xid != fdw_xact->local_xid)
+			matches = false;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdw_xact->dbid != dbid)
+			matches = false;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdw_xact->serverid)
+			matches = false;
+
+		/* userid */
+		if (OidIsValid(userid) && fdw_xact->userid != userid)
+			matches = false;
+
+		/* Append it if matched */
+		if (matches)
+			fdw_xact_list = lappend(fdw_xact_list, fdw_xact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdw_xact_list;
+}
+
+/*
+ * fdw_xact_redo
+ * Apply the redo log for a foreign transaction.
+ */
+void
+fdw_xact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDW_XACT_REMOVE)
+	{
+		xl_fdw_xact_remove *record = (xl_fdw_xact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. Returned string
+ * value is used to identify foreign transaction. The identifier should not
+ * be same as any other concurrent prepared transaction identifier.
+ *
+ * To make the foreign transactionid, we should ideally use something like
+ * UUID, which gives unique ids with high probability, but that may be expensive
+ * here and UUID extension which provides the function to generate UUID is
+ * not part of the core code.
+ */
+static char *
+generate_fdw_xact_identifier(Oid serverid, Oid userid)
+{
+	char*	fdw_xact_id;
+
+	fdw_xact_id = (char *)palloc(FDW_XACT_ID_MAX_LEN * sizeof(char));
+
+	snprintf(fdw_xact_id, FDW_XACT_ID_MAX_LEN, "%s_%ld_%d_%d",
+			 "fx", Abs(random()), serverid, userid);
+	fdw_xact_id[strlen(fdw_xact_id)] = '\0';
+
+	return fdw_xact_id;
+}
+
+/*
+ * CheckPointFdwXact
+ *
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * In order to avoid disk I/O while holding a light weight lock, the function
+ * first collects the files which need to be synced under FdwXactLock and then
+ * syncs them after releasing the lock. This approach creates a race condition:
+ * after releasing the lock, and before syncing a file, the corresponding
+ * foreign transaction entry and hence the file might get removed. The function
+ * checks whether that's true and ignores the error if so.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdw_xacts = 0;
+
+	/* Quick get-away, before taking lock */
+	if (max_prepared_foreign_xacts <= 0)
+		return;
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Another quick, before we allocate memory */
+	if (FdwXactCtl->numFdwXacts <= 0)
+	{
+		LWLockRelease(FdwXactLock);
+		return;
+	}
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		FdwXact		fxact = FdwXactCtl->fdw_xacts[cnt];
+
+		if ((fxact->valid || fxact->inredo) &&
+			!fxact->ondisk &&
+			fxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fxact->dbid, fxact->local_xid,
+								fxact->serverid, fxact->userid,
+								buf, len);
+			fxact->ondisk = true;
+			fxact->insert_start_lsn = InvalidXLogRecPtr;
+			fxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdw_xacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDW_XACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdw_xacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdw_xacts,
+							 serialized_fdw_xacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDW_XACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDW_XACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * ProcessFdwXactBuffer
+ *
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid = ShmemVariableCache->nextXid;
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(insert_start_lsn != InvalidXLogRecPtr);
+
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid, true);
+		if (buf == NULL)
+		{
+			ereport(WARNING,
+					(errmsg("removing corrupt fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+			return NULL;
+		}
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return thecontents in
+ * a structure allocated in-memory. Otherwise return NULL. The structure can
+ * be later freed by the caller.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				bool give_warnings)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+	{
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not stat FDW transaction state file \"%s\": %m",
+							path)));
+		return NULL;
+	}
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdw_xact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+	{
+		CloseTransientFile(fd);
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+		return NULL;
+	}
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+	{
+		CloseTransientFile(fd);
+		return NULL;
+	}
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_READ);
+	if (read(fd, buf, stat.st_size) != stat.st_size)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read FDW transaction state file \"%s\": %m",
+					  path)));
+		return NULL;
+	}
+
+	pgstat_report_wait_end();
+	CloseTransientFile(fd);
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+	{
+		pfree(buf);
+		return NULL;
+	}
+
+	/* Check if the contents is an expected data */
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fxact_file_data->dbid  != dbid ||
+		fxact_file_data->serverid != serverid ||
+		fxact_file_data->userid != userid ||
+		fxact_file_data->local_xid != xid)
+	{
+		ereport(WARNING,
+			(errmsg("invalid foreign transaction state file \"%s\"",
+					path)));
+		CloseTransientFile(fd);
+		pfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/*
+ * PrescanFdwXacts
+ *
+ * Scan the all foreign transactions directory for oldest active transaction.
+ * This is run during database startup, after we completed reading WAL.
+ * ShmemVariableCache->nextXid has been set to one more than the highest XID
+ * for which evidence exists in WAL.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	TransactionId nextXid = ShmemVariableCache->nextXid;
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+		 strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			TransactionId local_xid;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/*
+			 * Remove a foreign prepared transaction file corresponding to an
+			 * XID, which is too new.
+			 */
+			if (TransactionIdFollowsOrEquals(local_xid, nextXid))
+			{
+				ereport(WARNING,
+						(errmsg("removing future foreign prepared transaction file \"%s\"",
+								clde->d_name)));
+				RemoveFdwXactFile(dbid, local_xid, serverid, userid, true);
+				continue;
+			}
+
+			if (TransactionIdPrecedesOrEquals(local_xid, oldestActiveXid))
+				oldestActiveXid = local_xid;
+		}
+	}
+
+	FreeDir(cldir);
+	return oldestActiveXid;
+}
+
+/*
+ * restoreFdwXactData
+ *
+ * Scan pg_fdw_xact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * FdwXactRedoAdd
+ *
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fxact = insert_fdw_xact(fxact_data->dbid, fxact_data->local_xid,
+							fxact_data->serverid, fxact_data->userid,
+							fxact_data->fdw_xact_id);
+
+	/*
+	 * Set status as preparing, since we do not know the xact status
+	 * right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->insert_start_lsn = start_lsn;
+	fxact->insert_end_lsn = end_lsn;
+	fxact->inredo = true;	/* added in redo */
+	fxact->valid = false;
+	fxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * FdwXactRedoRemove
+ *
+ * Remove the corresponding fdw_xact entry from FdwXactCtl.
+ * Also remove fdw_xact file if a foreign transaction was saved
+ * via an earlier checkpoint.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdw_xact(dbid, xid, serverid, userid,
+							   false);
+
+	if (fdwxact == NULL)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdw_xact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(bool *newval, void **extra, GucSource source)
+{
+	/* Parameter check */
+	if (*newval &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_xacts\" or \"max_foreign_xact_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdw_xacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_prepared_fdw_xacts(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdw_xacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdw_xacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(6, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdw_xacts = get_all_fdw_xacts(&num_fdw_xacts);
+		status->num_xacts = num_fdw_xacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdw_xact = &status->fdw_xacts[status->cur_xact++];
+		Datum		values[6];
+		bool		nulls[6];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdw_xact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdw_xact->dbid);
+		values[1] = TransactionIdGetDatum(fdw_xact->local_xid);
+		values[2] = ObjectIdGetDatum(fdw_xact->serverid);
+		values[3] = ObjectIdGetDatum(fdw_xact->userid);
+		switch (fdw_xact->status)
+		{
+			case FDW_XACT_PREPARING:
+				xact_status = "prepared";
+				break;
+			case FDW_XACT_COMMITTING_PREPARED:
+				xact_status = "committing";
+				break;
+			case FDW_XACT_ABORTING_PREPARED:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		/* should this be really interpreted by FDW */
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdw_xact->fdw_xact_id,
+															 strlen(fdw_xact->fdw_xact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+	bool			ret;
+
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, true);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	ret = FdwXactResolveForeignTransaction(fdwxact);
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, false);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	remove_fdw_xact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/foreign/fdwxact_launcher.c b/src/backend/foreign/fdwxact_launcher.c
new file mode 100644
index 0000000..6782c33
--- /dev/null
+++ b/src/backend/foreign/fdwxact_launcher.c
@@ -0,0 +1,587 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * There is a shared memory area where the information of resolver process
+ * is stored. Requesting of starting new resolver process by backend process
+ * is done via that shared memory area. Note that the launcher is assuming
+ * that there is no more than one starting request for a database.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/foreign/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/resolver_internal.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid, int slot);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+Datum pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS);
+
+/*
+ * Wake up the launcher process.
+ */
+void
+FdwXactLauncherWakeup(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR1);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactQueue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+
+		now = GetCurrentTimestamp();
+
+		if (TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			bool launched;
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+				last_start_time = now;
+		}
+		else
+		{
+			/*
+			 * The wint in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver worker
+ * if not running yet. A foreign transaction resolver worker is responsible
+ * for resolution of foreign transaction that are registered on a database.
+ * So if a resolver worker already is launched, we don't need to launch new
+ * one.
+ */
+void
+fdwxact_maybe_launch_resolver(bool ignore_error)
+{
+	FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->pid != InvalidPid &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * If we found the resolver for my database, we don't need to launch new
+	 * one but wake running worker up.
+	 */
+	if (found)
+	{
+		SetLatch(resolver->latch);
+
+		elog(DEBUG1, "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		return;
+	}
+
+	/* Looking for unused worker slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	/*
+	 * However if there are no more free worker slots, inform user about it before
+	 * exiting.
+	 */
+	if (!found)
+	{
+		LWLockRelease(FdwXactResolverLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+		return;
+	}
+
+	Assert(resolver->pid == InvalidPid);
+
+	/* Found a new resolver process */
+	resolver->dbid = MyDatabaseId;
+	resolver->in_use = true;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Wake up launcher */
+	FdwXactLauncherWakeup();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' at 'slot' if given. If slot is negative value we find an unused slot.
+ * Note that caller must hold FdwXactResolverLock in exclusive mode.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid, int slot)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int launch_slot = slot;
+
+	/* If slot number is invalid, we find an unused slot */
+	if (launch_slot < 0)
+	{
+		int i;
+
+		for (i = 0; i < max_foreign_xact_resolvers; i++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+			if (resolver->in_use && resolver->dbid == dbid)
+				return;
+
+			if (!resolver->in_use)
+			{
+				launch_slot = i;
+				break;
+			}
+		}
+	}
+
+	/* No unused found */
+	if (launch_slot < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[launch_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_main_arg = Int32GetDatum(launch_slot);
+	bgw.bgw_notify_pid = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch all foreign transaction resolvers that are required by backend process
+ * but not running.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	int i, j;
+	int num_launches = 0;
+	int num_unused_slots = 0;
+	int num_dbs = 0;
+	bool launched = false;
+	Oid	*dbs_to_launch;
+	Oid *dbs_having_worker = palloc0(sizeof(Oid) * max_foreign_xact_resolvers);
+
+	/*
+	 * Launch resolver workers on the databases that are requested
+	 * by backend processes.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* Remember unused worker slots */
+		if (!resolver->in_use)
+			num_unused_slots++;
+
+		/* Remember databases that are having a resolve worker */
+		if (OidIsValid(resolver->dbid))
+			dbs_having_worker[num_dbs++] = resolver->dbid;
+
+		/* Launch new foreign transaction resolver worker on the database */
+		if (resolver->in_use &&
+			OidIsValid(resolver->dbid) &&
+			resolver->pid == InvalidPid)
+		{
+			fdwxact_launch_resolver(resolver->dbid, i);
+			launched = true;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* There is no unused slot, exit */
+	if (num_unused_slots == 0)
+		return launched;
+
+	dbs_to_launch = (Oid *) palloc(sizeof(Oid) * num_unused_slots);
+
+	/*
+	 * If there is unused slot, we can launch foreign transaction resolver
+	 * on databases that has unresolved foreign transaction but doesn't
+	 * have any resolver. This usually happens when resolvers crash for
+	 * whatever reason. Scanning all FdwXact entries could takes time but
+	 * since this is a relaunch case it's not harmless.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool found = false;
+
+		if (num_launches > num_unused_slots)
+			break;
+
+		for (j = 0; j < num_dbs; j++)
+		{
+			if (dbs_having_worker[j] == fdw_xact->dbid)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (found)
+			continue;
+
+		dbs_to_launch[num_launches++] = fdw_xact->dbid;
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* Launch resolver process for a database at any worker slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < num_launches; i++)
+	{
+		fdwxact_launch_resolver(dbs_to_launch[i], -1);
+		launched = true;
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+
+/*
+ * Returns activity of foreign transaction resolvers, including pids, the number
+ * of tasks and the last resolution time.
+ */
+Datum
+pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		if (resolver->pid == 0)
+		{
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/foreign/fdwxact_resolver.c b/src/backend/foreign/fdwxact_resolver.c
new file mode 100644
index 0000000..7f7ff8f
--- /dev/null
+++ b/src/backend/foreign/fdwxact_resolver.c
@@ -0,0 +1,310 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for every databases.
+ *
+ * A resolver process continues to resolve foreign transactions on a database
+ * It resolves two types of foreign transactions: on-line foreign transaction
+ * and dangling foreign transaction. The on-line foreign transaction is a
+ * foreign transaction that a concurrent backend process is waiting for
+ * resolution. The dangling transaction is a foreign transaction that corresponding
+ * distributed transaction ended up in in-doubt state. A resolver process
+ * doesn' exit as long as there is at least one unresolved foreign transaction
+ * on the database even if the timeout has come.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/foreign/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/resolver_internal.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+
+//static MemoryContext ResolveContext = NULL;
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FdwXactRslvLoop(void);
+static long FdwXactRslvComputeSleepTime(TimestampTz now);
+static void FdwXactRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+	FdwXactLauncherWakeup();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	TIMESTAMP_NOBEGIN(MyFdwXactResolver->last_resolved_time);
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactRslvLoop(void)
+{
+	FdwXactResolveState *fstate;
+
+	/* Create an FdwXactResolveState */
+	fstate = CreateFdwXactResolveState();
+
+	/* Enter main loop */
+	for (;;)
+	{
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time;
+		bool		resolved;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve a distributed transaction */
+		StartTransactionCommand();
+		resolved = FdwXactResolveDistributedTransaction(fstate);
+		CommitTransactionCommand();
+
+		now = GetCurrentTimestamp();
+
+		/* Update my state */
+		if (resolved)
+			MyFdwXactResolver->last_resolved_time = now;
+
+		/* Check for fdwxact resolver timeout */
+		FdwXactRslvCheckTimeout(now);
+
+		/*
+		 * If we have resolved any distributed transaction we go the next
+		 * without both resolving dangling transaction and sleeping because
+		 * there might be other on-line transactions waiting to be resolved.
+		 */
+		if (!resolved)
+		{
+			/* Resolve dangling transactions as mush as possible */
+			StartTransactionCommand();
+			FdwXactResolveAllDanglingTransactions(MyDatabaseId);
+			CommitTransactionCommand();
+
+			sleep_time = FdwXactRslvComputeSleepTime(now);
+
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   sleep_time,
+						   WAIT_EVENT_FDW_XACT_RESOLVER_MAIN);
+
+			if (rc & WL_POSTMASTER_DEATH)
+				proc_exit(1);
+		}
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/*
+	 * Reached to the timeout. We exit if there is no more both pending on-line
+	 * transactions and dangling transactions.
+	 */
+	if (!fdw_xact_exists(InvalidTransactionId, MyDatabaseId, InvalidOid,
+						 InvalidOid))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyFdwXactResolver->dbid))));
+		CommitTransactionCommand();
+
+		fdwxact_resolver_detach();
+		proc_exit(0);
+	}
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. Return the sleep time
+ * in milliseconds, -1 means that we reached to the timeout and should exits
+ */
+static long
+FdwXactRslvComputeSleepTime(TimestampTz now)
+{
+	static TimestampTz	wakeuptime = 0;
+	long	sleeptime;
+	long	sec_to_timeout;
+	int		microsec_to_timeout;
+
+	if (now >= wakeuptime)
+		wakeuptime = TimestampTzPlusMilliseconds(now,
+												 foreign_xact_resolution_retry_interval);
+
+	/* Compute relative time until wakeup. */
+	TimestampDifference(now, wakeuptime,
+						&sec_to_timeout, &microsec_to_timeout);
+
+	sleeptime = sec_to_timeout * 1000 + microsec_to_timeout / 1000;
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index eac78a5..1873a24 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -155,6 +155,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb4..cfd73f5 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -16,6 +16,8 @@
 
 #include "libpq/pqsignal.h"
 #include "access/parallel.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index bbe7361..757d060 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3492,6 +3492,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
 			event_name = "LogicalLauncherMain";
 			break;
@@ -3683,6 +3689,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3898,6 +3907,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDW_XACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b3..1c9ca53 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -99,6 +99,8 @@
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
 #include "common/ip.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "lib/ilist.h"
 #include "libpq/auth.h"
 #include "libpq/libpq.h"
@@ -905,6 +907,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires maX_foreign_xact_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -980,12 +986,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 59c003d..ce09a2a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDW_XACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..5f321fe 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "foreign/fdwxact_launcher.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -150,6 +151,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, BackendRandomShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +273,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	BackendRandomShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bd20497..d05d89f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -90,6 +90,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -245,6 +247,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1318,6 +1321,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
 	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	volatile TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1379,6 +1383,7 @@ GetOldestXmin(Relation rel, int flags)
 	/* fetch into volatile var while ProcArrayLock is held */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1429,6 +1434,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDW_XACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3005,6 +3019,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..a42d06e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,5 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+FdwXactLock					46
+FdwXactResolverLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f30e08..577d2ff 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "foreign/fdwxact.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -397,6 +398,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -797,6 +802,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f413395..2b3dee5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -43,6 +43,8 @@
 #include "commands/async.h"
 #include "commands/prepare.h"
 #include "executor/spi.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "jit/jit.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -2904,6 +2906,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c5ba149..d60d027 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -41,6 +41,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "commands/trigger.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "jit/jit.h"
 #include "libpq/auth.h"
@@ -659,6 +660,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -1831,6 +1836,16 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+			gettext_noop("Sets the usage of two-phase commit protocol for distributed transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		false,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -2235,6 +2250,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c0d3fb8..f398451 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -121,6 +121,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -287,6 +289,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index ad06e8e..ca3eb62 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3f203c6..dfecda1 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdw_xact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 895a51f..5f0683d 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -306,6 +306,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_worker_processes);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_xacts setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index d63a3a2..e12cb00 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -730,6 +730,7 @@ GuessControlValues(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -957,6 +958,7 @@ RewriteControlFile(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* Contents are protected with a CRC */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..15bfeb4 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -26,6 +26,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact_xlog.h"
 #include "replication/message.h"
 #include "replication/origin.h"
 #include "rmgrdesc.h"
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 0bbe9879..c15dff7 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDW_XACT_ID, "Foreign Transactions", fdw_xact_redo, fdw_xact_desc, fdw_xact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 0e932da..b199c88 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index c7b4144..7180bd1 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -105,6 +105,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE				(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 30610b3..795e85a 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -227,6 +227,7 @@ typedef struct xl_parameter_change
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6..3d5333a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -178,6 +178,7 @@ typedef struct ControlFileData
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a146510..895337b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5199,6 +5199,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_fdwxact_resolver', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,oid,timestamptz}',
+  proargmodes => '{o,o,o,o}',
+  proargnames => '{pid,dbid,n_entries,last_resolved_time}',
+  prosrc => 'pg_stat_get_fdwxact_resolver' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5910,6 +5917,22 @@
   proargnames => '{type,name,args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_prepared_fdw_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{dbid,transaction,serverid,userid,status,identifier}',
+  prosrc => 'pg_prepared_fdw_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction',
+  proname => 'pg_remove_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_remove_fdw_xact' },
+{ oid => '6052', descr => 'resolve foreign transaction',
+  proname => 'pg_resolve_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_resolve_fdw_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb54..f76e83d 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "foreign/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
 
@@ -168,6 +169,12 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef bool (*PrepareForeignTransaction_function) (ForeignTransaction *foreign_xact);
+typedef bool (*CommitForeignTransaction_function) (ForeignTransaction *foreign_xact);
+typedef bool (*RollbackForeignTransaction_function) (ForeignTransaction *foreing_xact);
+typedef bool (*ResolveForeignTransaction_function) (ForeignTransaction *foreign_xact,
+													bool is_commit);
+typedef bool (*IsTwoPhaseCommitEnabled_function) (Oid serverid);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -235,6 +242,13 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for distributed transactions */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	ResolveForeignTransaction_function ResolveForeignTransaction;
+	IsTwoPhaseCommitEnabled_function IsTwoPhaseCommitEnabled;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -247,7 +261,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
@@ -258,4 +271,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in foreign/fdwxact.c */
+extern void FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id);
+
 #endif							/* FDWAPI_H */
diff --git a/src/include/foreign/fdwxact.h b/src/include/foreign/fdwxact.h
new file mode 100644
index 0000000..5138a2c
--- /dev/null
+++ b/src/include/foreign/fdwxact.h
@@ -0,0 +1,147 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL distributed transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact.h
+ */
+#ifndef FDW_XACT_H
+#define FDW_XACT_H
+
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "foreign/fdwxact_xlog.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+#define	FDW_XACT_NOT_WAITING		0
+#define	FDW_XACT_WAITING			1
+#define	FDW_XACT_WAIT_COMPLETE		2
+
+#define FdwXactEnabled() (max_prepared_foreign_xacts > 0)
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDW_XACT_ID_MAX_LEN 200
+
+/* Enum to track the status of prepared foreign transaction */
+typedef enum
+{
+	FDW_XACT_INITIAL,
+	FDW_XACT_PREPARING,					/* foreign transaction is being prepared */
+	FDW_XACT_PREPARED,					/* foriegn transaction is prepared */
+	FDW_XACT_COMMITTING_PREPARED,		/* foreign prepared transaction is to
+										 * be committed */
+	FDW_XACT_ABORTING_PREPARED, /* foreign prepared transaction is to be
+								 * aborted */
+} FdwXactStatus;
+
+/* Shared memory entry for a prepared or being prepared foreign transaction */
+typedef struct FdwXactData *FdwXact;
+
+typedef struct FdwXactData
+{
+	FdwXact		fxact_free_next;	/* Next free FdwXact entry */
+	FdwXact		fxact_next;		/* Pointer to the neext FdwXact entry accosiated
+								 * with the same transaction */
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	FdwXactStatus status;		/* The state of the foreign
+								 * transaction. This doubles as the
+								 * action to be taken on this entry. */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid; /* Has the entry been complete and written to file? */
+	BackendId	registered_backend;	/* Backend who registered this entry */
+	bool		ondisk;			/* TRUE if prepare state file is on disk */
+	bool		inredo;			/* TRUE if entry was added via xlog_redo */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/* Shared memory layout for maintaining foreign prepared transaction entries. */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		freeFdwXacts;
+
+	/* Number of valid foreign transaction entries */
+	int			numFdwXacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdw_xacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* Struct for foreign transaction resolution */
+typedef struct FdwXactResolveState
+{
+	Oid				dbid;		/* database oid */
+	TransactionId	wait_xid;	/* local transaction id waiting to be resolved */
+	PGPROC			*waiter;	/* backend process waiter */
+	FdwXact			fdwxact;	/* foreign transaction entries to resolve */
+} FdwXactResolveState;
+
+/* Struct for foreign transaction passed to API */
+typedef struct ForeignTransaction
+{
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	char			*fx_id;
+} ForeignTransaction;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern bool foreign_twophase_commit;
+
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdw_xact_exists(TransactionId xid, Oid dboid, Oid serverid,
+				Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactResolveDistributedTransaction(FdwXactResolveState *fstate);
+extern void FdwXactResolveAllDanglingTransactions(Oid dbid);
+extern bool ForeignTwophaseCommitRequired(void);
+extern FdwXactResolveState *CreateFdwXactResolveState(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo,
+												  int flags);
+extern bool check_foreign_twophase_commit(bool *newval, void **extra,
+										  GucSource source);
+
+#endif   /* FDW_XACT_H */
diff --git a/src/include/foreign/fdwxact_launcher.h b/src/include/foreign/fdwxact_launcher.h
new file mode 100644
index 0000000..6ed003b
--- /dev/null
+++ b/src/include/foreign/fdwxact_launcher.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _FDWXACT_LAUNCHER_H
+#define _FDWXACT_LAUNCHER_H
+
+#include "foreign/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherWakeup(void);
+
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+
+extern bool IsFdwXactLauncher(void);
+
+extern void fdwxact_maybe_launch_resolver(bool ignore_error);
+
+
+#endif	/* _FDWXACT_LAUNCHER_H */
diff --git a/src/include/foreign/fdwxact_resolver.h b/src/include/foreign/fdwxact_resolver.h
new file mode 100644
index 0000000..5afd98c
--- /dev/null
+++ b/src/include/foreign/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "foreign/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/foreign/fdwxact_xlog.h b/src/include/foreign/fdwxact_xlog.h
new file mode 100644
index 0000000..f42725e
--- /dev/null
+++ b/src/include/foreign/fdwxact_xlog.h
@@ -0,0 +1,51 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDW_XACT_INSERT	0x00
+#define XLOG_FDW_XACT_REMOVE	0x10
+
+/* Same as GIDSIZE */
+#define FDW_XACT_MAX_ID_LEN 200
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdw_xact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+} xl_fdw_xact_remove;
+
+extern void fdw_xact_redo(XLogReaderState *record);
+extern void fdw_xact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdw_xact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e6..d030368 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -68,10 +68,10 @@ typedef struct ForeignTable
 	List	   *options;		/* ftoptions as DefElem list */
 } ForeignTable;
 
-
 extern ForeignServer *GetForeignServer(Oid serverid);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
 							bool missing_ok);
diff --git a/src/include/foreign/resolver_internal.h b/src/include/foreign/resolver_internal.h
new file mode 100644
index 0000000..9f8676b
--- /dev/null
+++ b/src/include/foreign/resolver_internal.h
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _RESOLVER_INTERNAL_H
+#define _RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/*
+	 * Foreign transaction resolution queue. Protected by FdwXactLock.
+	 */
+	SHM_QUEUE	FdwXactQueue;
+
+	/* Supervisor process */
+	pid_t		launcher_pid;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* _RESOLVER_INTERNAL_H */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d59c24a..f74d1be 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,6 +759,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDW_XACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -832,7 +834,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDW_XACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -912,6 +915,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_READ,
+	WAIT_EVENT_FDW_XACT_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 5c19a61..93953dc 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -150,6 +150,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction
+								 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 75bab29..25d6a2f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDW_XACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..81560bd 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -94,6 +94,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 078129f..31502a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1413,6 +1413,13 @@ pg_policies| SELECT n.nspname AS schemaname,
    FROM ((pg_policy pol
      JOIN pg_class c ON ((c.oid = pol.polrelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+pg_prepared_fdw_xacts| SELECT f.dbid,
+    f.transaction,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.identifier
+   FROM pg_prepared_fdw_xacts() f(dbid, transaction, serverid, userid, status, identifier);
 pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
@@ -1821,6 +1828,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_fdwxact_resolvers| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_fdwxact_resolver() r(pid, dbid, n_entries, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.10.5

v17-0003-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v17-0003-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 7b99a4307c09c56a86556b67089b1f4fdd226b04 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v17 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/connection.c              | 534 +++++++++++++++++++------
 contrib/postgres_fdw/expected/postgres_fdw.out | 387 +++++++++++++++++-
 contrib/postgres_fdw/option.c                  |   5 +-
 contrib/postgres_fdw/postgres_fdw.c            |  60 ++-
 contrib/postgres_fdw/postgres_fdw.h            |  10 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 151 ++++++-
 doc/src/sgml/postgres-fdw.sgml                 |  37 ++
 7 files changed, 1040 insertions(+), 144 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fe4893a..9c0fa9a 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -15,8 +15,11 @@
 #include "postgres_fdw.h"
 
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -56,6 +59,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		am_participant_of_ac;	/* true if fdwxact code control the transaction */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -78,7 +82,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_xact_callback(XactEvent event, void *arg);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
@@ -91,20 +95,14 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
-
-
-/*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
- */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static bool pgfdw_commit_transaction(ConnCacheEntry *entry);
+static bool pgfdw_rollback_transaction(ConnCacheEntry *entry);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
 {
 	bool		found;
 	ConnCacheEntry *entry;
@@ -136,11 +134,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
 	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
+	key = umid;
 
 	/*
 	 * Find or create cached entry for requested connection.
@@ -182,6 +177,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping		*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +186,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->am_participant_of_ac = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +197,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,16 +213,46 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
 /*
  * Connect to remote server using specified server and user mapping properties.
+ * If the attempt to connect fails, and the caller can handle connection failure
+ * (connection_error_ok = true) return NULL, throw error otherwise.
  */
 static PGconn *
 connect_pg_server(ForeignServer *server, UserMapping *user)
@@ -265,11 +301,22 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 
 		conn = PQconnectdbParams(keywords, values, false);
 		if (!conn || PQstatus(conn) != CONNECTION_OK)
+		{
+			char	   *connmessage;
+			int			msglen;
+
+			/* libpq typically appends a newline, strip that */
+			connmessage = pstrdup(PQerrorMessage(conn));
+			msglen = strlen(connmessage);
+			if (msglen > 0 && connmessage[msglen - 1] == '\n')
+				connmessage[msglen - 1] = '\0';
+
 			ereport(ERROR,
 					(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
 					 errmsg("could not connect to server \"%s\"",
 							server->servername),
 					 errdetail_internal("%s", pchomp(PQerrorMessage(conn)))));
+		}
 
 		/*
 		 * Check that non-superuser has used password to establish connection;
@@ -414,15 +461,24 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
+	ForeignServer	*server = GetForeignServer(serverid);
 
 	/* Start main transaction if we haven't yet */
 	if (entry->xact_depth <= 0)
 	{
 		const char *sql;
 
+		/* Register the new foreign server if enabled */
+		if (server_uses_twophase_commit(server))
+		{
+			/* Register foreign server with auto-generated identifer */
+			FdwXactRegisterForeignTransaction(serverid, userid, NULL);
+			entry->am_participant_of_ac = true;
+		}
+
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
@@ -650,12 +706,11 @@ static void
 pgfdw_xact_callback(XactEvent event, void *arg)
 {
 	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
+	ConnCacheEntry	*entry;
 
-	/* Quick exit if no connections were touched in this transaction. */
+	/* Quick exit if no connections were touched in this transaction */
 	if (!xact_got_connection)
 		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote transactions, and
 	 * close them.
@@ -663,17 +718,20 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 	hash_seq_init(&scan, ConnectionHash);
 	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
 	{
-		PGresult   *res;
-
 		/* Ignore cache entry if no open connection right now */
 		if (entry->conn == NULL)
 			continue;
 
+		/*
+		 * Foreign transactions participating to atomic commit are ended
+		 * by two-phase commit APIs. Ignore them.
+		 */
+		if (entry->am_participant_of_ac)
+			continue;
+
 		/* If it has an open remote transaction, try to close it */
 		if (entry->xact_depth > 0)
 		{
-			bool		abort_cleanup_failure = false;
-
 			elog(DEBUG3, "closing remote transaction on connection %p",
 				 entry->conn);
 
@@ -681,40 +739,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 			{
 				case XACT_EVENT_PARALLEL_PRE_COMMIT:
 				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
+					pgfdw_commit_transaction(entry);
 					break;
 				case XACT_EVENT_PRE_PREPARE:
 
@@ -739,66 +764,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 					break;
 				case XACT_EVENT_PARALLEL_ABORT:
 				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
+					pgfdw_rollback_transaction(entry);
 					break;
 			}
 		}
@@ -1193,3 +1159,325 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * The function prepares transaction on foreign server. This function
+ * is called only at the pre-commit phase of the local transaction. Since
+ * we should have the connection to the server that we are interested in
+ * we don't use serverid and userid that are necessary to get user mapping
+ * that is the key of the connection cache.
+ */
+bool
+postgresPrepareForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+	StringInfo	command;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	if (!entry->conn)
+		return false;
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", foreign_xact->fx_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	if (result)
+		elog(DEBUG1, "prepared foreign transaction on server %u with ID %s",
+			 foreign_xact->server->serverid, foreign_xact->fx_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function commits the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresCommitForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	result = pgfdw_commit_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function rollbacks the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresRollbackForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool ret;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	/* Rollback a remote transaction */
+	ret = pgfdw_rollback_transaction(entry);
+
+	return ret;
+}
+
+bool
+postgresResolveForeignTransaction(ForeignTransaction *foreign_xact, bool is_commit)
+{
+	ConnCacheEntry *entry = NULL;
+	StringInfo	command;
+	bool result;
+	PGresult	*res;
+
+	entry = GetConnectionState(foreign_xact->usermapping->umid,
+							   false, false);
+
+	if (!entry->conn)
+		return false;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 foreign_xact->fx_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		/*
+		 * The command failed, raise a warning to log the reason of failure.
+		 * We may not be in a transaction here, so raising error doesn't
+		 * help. Even if we are in a transaction, it would be the resolver
+		 * transaction, which will get aborted on raising error, thus
+		 * delaying resolution of other prepared foreign transactions.
+		 */
+		pgfdw_report_error(WARNING, res, entry->conn, false, command->data);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * If we tried to COMMIT/ABORT a prepared transaction and the prepared
+		 * transaction was missing on the foreign server, it was probably
+		 * resolved by some other means. Anyway, it should be considered as resolved.
+		 */
+		result = (sqlstate == ERRCODE_UNDEFINED_OBJECT);
+	}
+	else
+		result = true;
+
+	elog(DEBUG1, "%s prepared foreign transaction on server %u with ID %s",
+		 is_commit ? "commit" : "rollback", foreign_xact->server->serverid,
+		 foreign_xact->fx_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->am_participant_of_ac = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+	xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
+
+static bool
+pgfdw_rollback_transaction(ConnCacheEntry *entry)
+{
+	bool abort_cleanup_failure = false;
+
+	/*
+	 * In rollback local transaction, if we don't the connection
+	 * it means any transaction started. So we can ragard it as
+	 * success.
+	 */
+	if (!entry || !entry->conn)
+		return true;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is already unsalvageable, don't touch it
+	 * further.
+	 */
+	if (entry->changing_xact_state)
+		return true;
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+	else
+	{
+		entry->have_prep_stmt = false;
+		entry->have_error = false;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return !abort_cleanup_failure;
+}
+
+static bool
+pgfdw_commit_transaction(ConnCacheEntry *entry)
+{
+	PGresult	*res;
+	bool result = false;
+
+	if (!entry || !entry->conn)
+		return false;
+
+	/*
+	 * If abort cleanup previously failed for this connection,
+	 * we can't issue any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	/*
+	 * If there were any errors in subtransactions, and we
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f5498c6..933d88b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,15 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -185,16 +207,19 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                                      List of foreign tables
- Schema |   Table    |  Server   |                   FDW options                    | Description 
---------+------------+-----------+--------------------------------------------------+-------------
- public | ft1        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft2        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft4        | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
- public | ft5        | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft6        | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft_pg_type | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
-(6 rows)
+                                         List of foreign tables
+ Schema |      Table       |  Server   |                   FDW options                    | Description 
+--------+------------------+-----------+--------------------------------------------------+-------------
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
+(9 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8607,3 +8632,345 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+\det+
+                                                 List of foreign tables
+ Schema |      Table       |  Server   |                            FDW options                            | Description 
+--------+------------------+-----------+-------------------------------------------------------------------+-------------
+ public | fpagg_tab_p1     | loopback  | (table_name 'pagg_tab_p1')                                        | 
+ public | fpagg_tab_p2     | loopback  | (table_name 'pagg_tab_p2')                                        | 
+ public | fpagg_tab_p3     | loopback  | (table_name 'pagg_tab_p3')                                        | 
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')                             | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1', use_remote_estimate 'true') | 
+ public | ft3              | loopback  | (table_name 'loct3', use_remote_estimate 'true')                  | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')                             | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type')                  | 
+ public | ftprt1_p1        | loopback  | (table_name 'fprt1_p1', use_remote_estimate 'true')               | 
+ public | ftprt1_p2        | loopback  | (table_name 'fprt1_p2')                                           | 
+ public | ftprt2_p1        | loopback  | (table_name 'fprt2_p1', use_remote_estimate 'true')               | 
+ public | ftprt2_p2        | loopback  | (table_name 'fprt2_p2', use_remote_estimate 'true')               | 
+ public | rem1             | loopback  | (table_name 'loc1')                                               | 
+ public | rem2             | loopback  | (table_name 'loc2')                                               | 
+(19 rows)
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+  srvname  
+-----------
+ loopback
+ loopback2
+ loopback3
+(3 rows)
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+ERROR:  cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  cannot prepare a transaction that modified remote tables
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 6854f1b..1f45b1c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -108,7 +108,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		 * Validate option value, when we can do so without any context.
 		 */
 		if (strcmp(def->defname, "use_remote_estimate") == 0 ||
-			strcmp(def->defname, "updatable") == 0)
+			strcmp(def->defname, "updatable") == 0 ||
+			strcmp(def->defname, "two_phase_commit") == 0)
 		{
 			/* these accept only boolean values */
 			(void) defGetBoolean(def);
@@ -177,6 +178,8 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* two phase commit support */
+		{"two_phase_commit", ForeignServerRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0803c30..24191a2 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/xact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -21,6 +22,7 @@
 #include "commands/explain.h"
 #include "commands/vacuum.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -359,6 +361,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
 							 void *extra);
+static bool postgresIsTwoPhaseCommitEnabled(Oid serverid);
 
 /*
  * Helper functions
@@ -452,7 +455,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -506,10 +508,29 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->ResolveForeignTransaction = postgresResolveForeignTransaction;
+	routine->IsTwoPhaseCommitEnabled = postgresIsTwoPhaseCommitEnabled;
+
 	PG_RETURN_POINTER(routine);
 }
 
 /*
+ * postgresIsTwoPhaseCommitEnabled
+ */
+static bool
+postgresIsTwoPhaseCommitEnabled(Oid serverid)
+{
+	ForeignServer	*server = GetForeignServer(serverid);
+
+
+	return server_uses_twophase_commit(server);
+}
+
+/*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
  *
@@ -1356,7 +1377,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2412,7 +2433,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2709,7 +2730,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								&retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3326,7 +3347,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4113,7 +4134,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4203,7 +4224,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4426,7 +4447,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
@@ -5807,3 +5828,26 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+/*
+ * server_uses_twophase_commit
+ * Returns true if the foreign server is configured to support 2PC.
+ */
+bool
+server_uses_twophase_commit(ForeignServer *server)
+{
+	ListCell		*lc;
+
+	/* Check the options for two phase compliance */
+	foreach(lc, server->options)
+	{
+		DefElem    *d = (DefElem *) lfirst(lc);
+
+		if (strcmp(d->defname, "two_phase_commit") == 0)
+		{
+			return defGetBoolean(d);
+		}
+	}
+	/* By default a server is not 2PC compliant */
+	return false;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 70b538e..4d1b754 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "foreign/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
@@ -115,7 +116,8 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
+extern PGconn *GetExistingConnection(Oid umid);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -123,6 +125,11 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern bool postgresPrepareForeignTransaction(ForeignTransaction *foreign_xact);
+extern bool postgresCommitForeignTransaction(ForeignTransaction *foreign_xact);
+extern bool postgresRollbackForeignTransaction(ForeignTransaction *foriegn_xact);
+extern bool postgresResolveForeignTransaction(ForeignTransaction *foreign_xact,
+											  bool is_commit);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -181,6 +188,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						List *remote_conds, List *pathkeys, bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e1b955f..9bbd159 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,19 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -2298,7 +2325,6 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 
 RESET enable_partitionwise_join;
 
-
 -- ===================================================================
 -- test partitionwise aggregates
 -- ===================================================================
@@ -2348,3 +2374,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+
+\det+
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 54b5e98..f4a9ff5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
-- 
2.10.5

v17-0004-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v17-0004-Add-regression-tests-for-atomic-commit.patchDownload
From 1ce2f489cb29e2ea86298e4ddae85f8d2ed38b28 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v17 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0..71c8b9d 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..a23f120
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port', two_phase_commit 'on');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port', two_phase_commit 'on');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 2ff2acc..bfc8f53 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2286,9 +2286,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2303,7 +2306,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

#3Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#2)

On Fri, Aug 03, 2018 at 05:52:24PM +0900, Masahiko Sawada wrote:

I attached the updated version patch as the previous versions conflict
with the current HEAD.

Please note that the latest patch set does not apply anymore, so this
patch is moved to next CF, waiting on author.
--
Michael

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#3)
4 attachment(s)

On Tue, Oct 2, 2018 at 3:10 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 03, 2018 at 05:52:24PM +0900, Masahiko Sawada wrote:

I attached the updated version patch as the previous versions conflict
with the current HEAD.

Please note that the latest patch set does not apply anymore, so this
patch is moved to next CF, waiting on author.

Thank you! Attached the latest version patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v18-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v18-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From c655515f56192080e0aef9b138952ae33713c091 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 8 Feb 2018 11:26:46 +0900
Subject: [PATCH v18 1/4] Keep track of writing on non-temporary relation.

---
 src/backend/access/heap/heapam.c | 12 ++++++++++++
 src/include/access/xact.h        |  5 +++++
 2 files changed, 17 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5f1a69c..31a44ca 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2623,6 +2623,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleGetOid(tup);
 }
 
@@ -3444,6 +3448,10 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4394,6 +4402,10 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 083e879..c7b4144 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -98,6 +98,11 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
-- 
2.10.5

v18-0003-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v18-0003-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 0bdaab36e322809a393cbae052dc1da8a5599c1e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v18 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/connection.c              | 534 +++++++++++++++++++------
 contrib/postgres_fdw/expected/postgres_fdw.out | 387 +++++++++++++++++-
 contrib/postgres_fdw/option.c                  |   5 +-
 contrib/postgres_fdw/postgres_fdw.c            |  60 ++-
 contrib/postgres_fdw/postgres_fdw.h            |  10 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 151 ++++++-
 doc/src/sgml/postgres-fdw.sgml                 |  37 ++
 7 files changed, 1040 insertions(+), 144 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fe4893a..9c0fa9a 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -15,8 +15,11 @@
 #include "postgres_fdw.h"
 
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -56,6 +59,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		am_participant_of_ac;	/* true if fdwxact code control the transaction */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -78,7 +82,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_xact_callback(XactEvent event, void *arg);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
@@ -91,20 +95,14 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
-
-
-/*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
- */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static bool pgfdw_commit_transaction(ConnCacheEntry *entry);
+static bool pgfdw_rollback_transaction(ConnCacheEntry *entry);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
 {
 	bool		found;
 	ConnCacheEntry *entry;
@@ -136,11 +134,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
 	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
+	key = umid;
 
 	/*
 	 * Find or create cached entry for requested connection.
@@ -182,6 +177,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping		*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +186,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->am_participant_of_ac = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +197,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,16 +213,46 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
 /*
  * Connect to remote server using specified server and user mapping properties.
+ * If the attempt to connect fails, and the caller can handle connection failure
+ * (connection_error_ok = true) return NULL, throw error otherwise.
  */
 static PGconn *
 connect_pg_server(ForeignServer *server, UserMapping *user)
@@ -265,11 +301,22 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 
 		conn = PQconnectdbParams(keywords, values, false);
 		if (!conn || PQstatus(conn) != CONNECTION_OK)
+		{
+			char	   *connmessage;
+			int			msglen;
+
+			/* libpq typically appends a newline, strip that */
+			connmessage = pstrdup(PQerrorMessage(conn));
+			msglen = strlen(connmessage);
+			if (msglen > 0 && connmessage[msglen - 1] == '\n')
+				connmessage[msglen - 1] = '\0';
+
 			ereport(ERROR,
 					(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
 					 errmsg("could not connect to server \"%s\"",
 							server->servername),
 					 errdetail_internal("%s", pchomp(PQerrorMessage(conn)))));
+		}
 
 		/*
 		 * Check that non-superuser has used password to establish connection;
@@ -414,15 +461,24 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
+	ForeignServer	*server = GetForeignServer(serverid);
 
 	/* Start main transaction if we haven't yet */
 	if (entry->xact_depth <= 0)
 	{
 		const char *sql;
 
+		/* Register the new foreign server if enabled */
+		if (server_uses_twophase_commit(server))
+		{
+			/* Register foreign server with auto-generated identifer */
+			FdwXactRegisterForeignTransaction(serverid, userid, NULL);
+			entry->am_participant_of_ac = true;
+		}
+
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
@@ -650,12 +706,11 @@ static void
 pgfdw_xact_callback(XactEvent event, void *arg)
 {
 	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
+	ConnCacheEntry	*entry;
 
-	/* Quick exit if no connections were touched in this transaction. */
+	/* Quick exit if no connections were touched in this transaction */
 	if (!xact_got_connection)
 		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote transactions, and
 	 * close them.
@@ -663,17 +718,20 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 	hash_seq_init(&scan, ConnectionHash);
 	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
 	{
-		PGresult   *res;
-
 		/* Ignore cache entry if no open connection right now */
 		if (entry->conn == NULL)
 			continue;
 
+		/*
+		 * Foreign transactions participating to atomic commit are ended
+		 * by two-phase commit APIs. Ignore them.
+		 */
+		if (entry->am_participant_of_ac)
+			continue;
+
 		/* If it has an open remote transaction, try to close it */
 		if (entry->xact_depth > 0)
 		{
-			bool		abort_cleanup_failure = false;
-
 			elog(DEBUG3, "closing remote transaction on connection %p",
 				 entry->conn);
 
@@ -681,40 +739,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 			{
 				case XACT_EVENT_PARALLEL_PRE_COMMIT:
 				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
+					pgfdw_commit_transaction(entry);
 					break;
 				case XACT_EVENT_PRE_PREPARE:
 
@@ -739,66 +764,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 					break;
 				case XACT_EVENT_PARALLEL_ABORT:
 				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
+					pgfdw_rollback_transaction(entry);
 					break;
 			}
 		}
@@ -1193,3 +1159,325 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * The function prepares transaction on foreign server. This function
+ * is called only at the pre-commit phase of the local transaction. Since
+ * we should have the connection to the server that we are interested in
+ * we don't use serverid and userid that are necessary to get user mapping
+ * that is the key of the connection cache.
+ */
+bool
+postgresPrepareForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+	StringInfo	command;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	if (!entry->conn)
+		return false;
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", foreign_xact->fx_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	if (result)
+		elog(DEBUG1, "prepared foreign transaction on server %u with ID %s",
+			 foreign_xact->server->serverid, foreign_xact->fx_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function commits the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresCommitForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	result = pgfdw_commit_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function rollbacks the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresRollbackForeignTransaction(ForeignTransaction *foreign_xact)
+{
+	ConnCacheEntry *entry = NULL;
+	bool ret;
+
+	entry = hash_search(ConnectionHash, &(foreign_xact->usermapping->umid),
+						HASH_FIND, NULL);
+
+	/* Rollback a remote transaction */
+	ret = pgfdw_rollback_transaction(entry);
+
+	return ret;
+}
+
+bool
+postgresResolveForeignTransaction(ForeignTransaction *foreign_xact, bool is_commit)
+{
+	ConnCacheEntry *entry = NULL;
+	StringInfo	command;
+	bool result;
+	PGresult	*res;
+
+	entry = GetConnectionState(foreign_xact->usermapping->umid,
+							   false, false);
+
+	if (!entry->conn)
+		return false;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 foreign_xact->fx_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		/*
+		 * The command failed, raise a warning to log the reason of failure.
+		 * We may not be in a transaction here, so raising error doesn't
+		 * help. Even if we are in a transaction, it would be the resolver
+		 * transaction, which will get aborted on raising error, thus
+		 * delaying resolution of other prepared foreign transactions.
+		 */
+		pgfdw_report_error(WARNING, res, entry->conn, false, command->data);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * If we tried to COMMIT/ABORT a prepared transaction and the prepared
+		 * transaction was missing on the foreign server, it was probably
+		 * resolved by some other means. Anyway, it should be considered as resolved.
+		 */
+		result = (sqlstate == ERRCODE_UNDEFINED_OBJECT);
+	}
+	else
+		result = true;
+
+	elog(DEBUG1, "%s prepared foreign transaction on server %u with ID %s",
+		 is_commit ? "commit" : "rollback", foreign_xact->server->serverid,
+		 foreign_xact->fx_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->am_participant_of_ac = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+	xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
+
+static bool
+pgfdw_rollback_transaction(ConnCacheEntry *entry)
+{
+	bool abort_cleanup_failure = false;
+
+	/*
+	 * In rollback local transaction, if we don't the connection
+	 * it means any transaction started. So we can ragard it as
+	 * success.
+	 */
+	if (!entry || !entry->conn)
+		return true;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is already unsalvageable, don't touch it
+	 * further.
+	 */
+	if (entry->changing_xact_state)
+		return true;
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+	else
+	{
+		entry->have_prep_stmt = false;
+		entry->have_error = false;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return !abort_cleanup_failure;
+}
+
+static bool
+pgfdw_commit_transaction(ConnCacheEntry *entry)
+{
+	PGresult	*res;
+	bool result = false;
+
+	if (!entry || !entry->conn)
+		return false;
+
+	/*
+	 * If abort cleanup previously failed for this connection,
+	 * we can't issue any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	/*
+	 * If there were any errors in subtransactions, and we
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 21a2ef5..15dadf4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,15 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -185,16 +207,19 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                                      List of foreign tables
- Schema |   Table    |  Server   |                   FDW options                    | Description 
---------+------------+-----------+--------------------------------------------------+-------------
- public | ft1        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft2        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft4        | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
- public | ft5        | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft6        | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft_pg_type | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
-(6 rows)
+                                         List of foreign tables
+ Schema |      Table       |  Server   |                   FDW options                    | Description 
+--------+------------------+-----------+--------------------------------------------------+-------------
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
+(9 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8650,3 +8675,345 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+\det+
+                                                 List of foreign tables
+ Schema |      Table       |  Server   |                            FDW options                            | Description 
+--------+------------------+-----------+-------------------------------------------------------------------+-------------
+ public | fpagg_tab_p1     | loopback  | (table_name 'pagg_tab_p1')                                        | 
+ public | fpagg_tab_p2     | loopback  | (table_name 'pagg_tab_p2')                                        | 
+ public | fpagg_tab_p3     | loopback  | (table_name 'pagg_tab_p3')                                        | 
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')                             | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1', use_remote_estimate 'true') | 
+ public | ft3              | loopback  | (table_name 'loct3', use_remote_estimate 'true')                  | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')                             | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type')                  | 
+ public | ftprt1_p1        | loopback  | (table_name 'fprt1_p1', use_remote_estimate 'true')               | 
+ public | ftprt1_p2        | loopback  | (table_name 'fprt1_p2')                                           | 
+ public | ftprt2_p1        | loopback  | (table_name 'fprt2_p1', use_remote_estimate 'true')               | 
+ public | ftprt2_p2        | loopback  | (table_name 'fprt2_p2', use_remote_estimate 'true')               | 
+ public | rem1             | loopback  | (table_name 'loc1')                                               | 
+ public | rem2             | loopback  | (table_name 'loc2')                                               | 
+(19 rows)
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+  srvname  
+-----------
+ loopback
+ loopback2
+ loopback3
+(3 rows)
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+ERROR:  cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  cannot prepare a transaction that modified remote tables
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 6854f1b..1f45b1c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -108,7 +108,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		 * Validate option value, when we can do so without any context.
 		 */
 		if (strcmp(def->defname, "use_remote_estimate") == 0 ||
-			strcmp(def->defname, "updatable") == 0)
+			strcmp(def->defname, "updatable") == 0 ||
+			strcmp(def->defname, "two_phase_commit") == 0)
 		{
 			/* these accept only boolean values */
 			(void) defGetBoolean(def);
@@ -177,6 +178,8 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* two phase commit support */
+		{"two_phase_commit", ForeignServerRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 6cbba97..20365a4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/xact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -21,6 +22,7 @@
 #include "commands/explain.h"
 #include "commands/vacuum.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -359,6 +361,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
 							 void *extra);
+static bool postgresIsTwoPhaseCommitEnabled(Oid serverid);
 
 /*
  * Helper functions
@@ -452,7 +455,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -506,10 +508,29 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->ResolveForeignTransaction = postgresResolveForeignTransaction;
+	routine->IsTwoPhaseCommitEnabled = postgresIsTwoPhaseCommitEnabled;
+
 	PG_RETURN_POINTER(routine);
 }
 
 /*
+ * postgresIsTwoPhaseCommitEnabled
+ */
+static bool
+postgresIsTwoPhaseCommitEnabled(Oid serverid)
+{
+	ForeignServer	*server = GetForeignServer(serverid);
+
+
+	return server_uses_twophase_commit(server);
+}
+
+/*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
  *
@@ -1356,7 +1377,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2411,7 +2432,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2708,7 +2729,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								&retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3325,7 +3346,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4112,7 +4133,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4202,7 +4223,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4425,7 +4446,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
@@ -5807,3 +5828,26 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+/*
+ * server_uses_twophase_commit
+ * Returns true if the foreign server is configured to support 2PC.
+ */
+bool
+server_uses_twophase_commit(ForeignServer *server)
+{
+	ListCell		*lc;
+
+	/* Check the options for two phase compliance */
+	foreach(lc, server->options)
+	{
+		DefElem    *d = (DefElem *) lfirst(lc);
+
+		if (strcmp(d->defname, "two_phase_commit") == 0)
+		{
+			return defGetBoolean(d);
+		}
+	}
+	/* By default a server is not 2PC compliant */
+	return false;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 70b538e..4d1b754 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "foreign/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
@@ -115,7 +116,8 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
+extern PGconn *GetExistingConnection(Oid umid);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -123,6 +125,11 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern bool postgresPrepareForeignTransaction(ForeignTransaction *foreign_xact);
+extern bool postgresCommitForeignTransaction(ForeignTransaction *foreign_xact);
+extern bool postgresRollbackForeignTransaction(ForeignTransaction *foriegn_xact);
+extern bool postgresResolveForeignTransaction(ForeignTransaction *foreign_xact,
+											  bool is_commit);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -181,6 +188,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						List *remote_conds, List *pathkeys, bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 88c4cb4..2554c9c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,19 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -2304,7 +2331,6 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a
 
 RESET enable_partitionwise_join;
 
-
 -- ===================================================================
 -- test partitionwise aggregates
 -- ===================================================================
@@ -2354,3 +2380,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+
+\det+
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 54b5e98..f4a9ff5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
-- 
2.10.5

v18-0004-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v18-0004-Add-regression-tests-for-atomic-commit.patchDownload
From 2b93b86357c8456273262738a1da72cdaa68f138 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v18 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0..71c8b9d 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..a23f120
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port', two_phase_commit 'on');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port', two_phase_commit 'on');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6890678..d1b181a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2286,9 +2286,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2303,7 +2306,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v18-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v18-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From b5cb1921a90f7b0f08b7c47119f1b7524e2b6edd Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH v18 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |   97 +
 doc/src/sgml/config.sgml                      |  124 ++
 doc/src/sgml/fdwhandler.sgml                  |  200 ++
 doc/src/sgml/func.sgml                        |   51 +
 doc/src/sgml/monitoring.sgml                  |   56 +
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   65 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   32 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/executor/execPartition.c          |    4 +
 src/backend/executor/nodeForeignscan.c        |    8 +
 src/backend/executor/nodeModifyTable.c        |    5 +
 src/backend/foreign/Makefile                  |    2 +-
 src/backend/foreign/fdwxact.c                 | 2762 +++++++++++++++++++++++++
 src/backend/foreign/fdwxact_launcher.c        |  587 ++++++
 src/backend/foreign/fdwxact_resolver.c        |  310 +++
 src/backend/foreign/foreign.c                 |   43 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    5 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   61 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/foreign/fdwapi.h                  |   18 +-
 src/include/foreign/fdwxact.h                 |  147 ++
 src/include/foreign/fdwxact_launcher.h        |   31 +
 src/include/foreign/fdwxact_resolver.h        |   23 +
 src/include/foreign/fdwxact_xlog.h            |   51 +
 src/include/foreign/foreign.h                 |    2 +-
 src/include/foreign/resolver_internal.h       |   65 +
 src/include/pgstat.h                          |    8 +-
 src/include/storage/proc.h                    |   10 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |   12 +
 57 files changed, 5052 insertions(+), 27 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100755 src/backend/foreign/fdwxact.c
 create mode 100644 src/backend/foreign/fdwxact_launcher.c
 create mode 100644 src/backend/foreign/fdwxact_resolver.c
 create mode 100644 src/include/foreign/fdwxact.h
 create mode 100644 src/include/foreign/fdwxact_launcher.h
 create mode 100644 src/include/foreign/fdwxact_resolver.h
 create mode 100644 src/include/foreign/fdwxact_xlog.h
 create mode 100644 src/include/foreign/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0179dee..792f361 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9622,6 +9622,103 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-prepared-fdw-xacts">
+  <title><structname>pg_prepared_fdw_xacts</structname></title>
+
+  <indexterm zone="view-pg-prepared-fdw-xacts">
+   <primary>pg_prepared_fdw_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_prepared_fdw_xacts</structname> displays
+   information about foreign transactions that are currently prepared on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="fdw-transaction-managements"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_prepared_xacts</structname> contains one row per prepared
+   foreign transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_prepared_fdw_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>transaction</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Transaction id that this foreign transaction associates with
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server that this foreign server is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction: <literal>prepared</literal>, <literal>committing</literal>, <literal>aborting</literal> or <literal>unknown</literal>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f11b8f7..406fd9c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1546,6 +1546,29 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+      <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the maximum number of foreign transactions that can be prepared
+        simultaneously. A single local transaction can give rise to multiple
+        foreign transaction. If <literal>N</literal> local transactions each
+        across <literal>K</literal> foreign server this value need to be set
+        <literal>N * K</literal>, not just <literal>N</literal>.
+        This parameter can only be set at server start.
+       </para>
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-work-mem" xreflabel="work_mem">
       <term><varname>work_mem</varname> (<type>integer</type>)
       <indexterm>
@@ -3611,6 +3634,78 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
      </variablelist>
     </sect2>
 
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+
+     <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+      <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+      <indexterm>
+       <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+        resolver is responsible for foreign transaction resolution on one database.
+       </para>
+       <para>
+        Foreign transaction resolution workers are taken from the pool defined by
+        <varname>max_worker_processes</varname>.
+       </para>
+       <para>
+        The default value is 0.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+      <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specify how long the foreign transaction resolver should wait when the last resolution
+        fails before retrying to resolve foreign transaction. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 10 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+      <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Terminate foreign transaction resolver processes that don't have any foreign
+        transactions to resolve longer than the specified number of milliseconds.
+        A value of zero disables the timeout mechanism.  You should set this value to
+        zero only if you set <varname>max_foreign_transaction_resolvers</varname> as
+        much as databases you have. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 60 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     </variablelist>
+    </sect2>
+
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -7826,6 +7921,35 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-foreign-transaction">
+    <title>Foreign Transaction Management</title>
+
+    <variablelist>
+
+     <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophase_commit">
+      <term><varname>foreign_twophase_commit</varname> (<type>bool</type>)
+       <indexterm>
+        <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+       </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether transaction commit will wait for all involving foreign transaction
+        to be resolved before the command returns a "success" indication to the client.
+        Both <varname>max_prepared_foreign_transactions</varname> and
+        <varname>max_foreign_transaction_resolvers</varname> must be non-zero value to
+        allow foreign twophase commit to be used.
+       </para>
+       <para>
+        This parameter can be changed at any time; the behavior for any one transaction
+        is determined by the setting in effect when it commits.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd..24c635c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1390,6 +1390,109 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>atomic commit</firstterm>
+     (as described in <xref linkend="fdw-transaction-managements"/>), it must call the
+     registrasaction function <function>FdwXactRegisterForeignTransaction</function>
+     and provide the following callback functions:
+    </para>
+
+    <para>
+<programlisting>
+bool
+PrepareForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Prepare a foreign transaction identified by <varname>foreign_xact</varname>.
+    This function is called at the pre-commit phase of the local
+    transaction if atomic commit is
+    required. Returning <literal>true</literal> means that preparing
+    the foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Commit a not-prepared foreign transaction identified by
+    <varname>foreign_xact</varname>.
+    This function is called at the pre-commit phase of local
+    transaction if atomic commit is not required. The atomic
+    commit is not required either when we modified data on
+    only one server including local server or when user doesn't
+    request atomic commit by <xref linkend="guc-foreign-twophase-commit"/>.
+    Returning <literal>true</literal> means that commit the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(ForeignTransaction *foreign_xact);
+</programlisting>
+    Rollback a not-prepared foreign transaction identified by
+    <varname>foreign_xact</varname>.
+    This function is called at the end of local transaction after
+    rollbacked locally either when user requested rollback or when
+    any error occurs within the transaction. This function could
+    be called recursively if any error occurs during rollback the
+    foreign transaction for whatever reason. You need to track
+    recursion and prevent this function from being called infinitely.
+    Returning <literal>true</literal> means that rollback the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+ResolvePreparedForeignTransaction(ForeignTransaction *foreign_xact,
+                                  bool is_commit);
+</programlisting>
+    Commit or rollback the prepared foreign transaction identified
+    by <varname>foreign_xact</varname>. on a connection to foreign server
+    When <varname>is_commit</varname> is true, it indicate that the foreign
+    transaction should be committed.
+    This function normally is called by the foreign transaction resolver
+    process but can also be called by <function>pg_resovle_fdw_xacts</function>
+    function. In the resolver process, this function is called either
+    when a backend requests the resolver process to resolve a distributed
+    transaction after prepared or when a database has dangling
+    transaction. Returning <literal>true</literal> means that resolving
+    the foreign transaction got successful.
+    In abort case, please note that the prepared foreign transaction
+    having identifier <varname>foreign__xact->fx_id</varname> might not
+    exist on the foreign server. If you failed to resolve the foreign
+    transaction due to undefined object error
+    (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) you should regards
+    it as success and return <literal>true</literal>.
+    </para>
+    <para>
+<programlisting>
+bool
+IsTwoPhaseCommitEnabled(Oid serverid);
+</programlisting>
+    Return <literal>true</literal> if foreign server identified by
+    <literal>serverid</literal> is capable of two-phase commit protocol.
+    This function is called when the transaction begins to modify data on
+    the foreign server. Return <literal>false</literal> indicates that
+    the current transaction cannot use atomic commit even if atomic commit
+    is requested by user.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>. To get informations of FDW-related
+      objects, you can use given a <literal>ForeignTransaction</literal>
+      instead (see <filename>foreign/fdwxact.h</filename> for details).
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1835,4 +1938,101 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+    <title>Transaction managements for Foreign Data Wrappers</title>
+
+    <sect2 id="fdw-transaction-atomic-commit">
+     <title>Atomic commit among multiple foreign servers</title>
+
+     <para>
+      <productname>PostgreSQL</productname> foreign transaction manager
+      allows FDWs to read and write data on foreign server within a transaction while
+      maintaining atomicity of the foreign data (aka atomic commit). Using
+      atomic commit, it guarantees that a distributed transaction is committed
+      or rollbacked on all participants foreign
+      server.  To achieve atomic commit, <productname>PostgreSQL</productname>
+      employees two-phase commit protocol, which is a type of atomic commitment
+      protocol. Every FDW that wish to support atomic commit
+      is required to support transaction management callback routines
+      (see <xref linkend="fdw-callbacks-transaction-managements"/> for details)
+      and register the foreign transaction using
+      <function>FdwXactRegisterForeignTransaction</function> when starting a
+      transaction on the foreign server. Transaction of registered foreign server
+      is managed by the foreign transaction manager.
+<programlisting>
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id)
+</programlisting>
+    This function should be called when a transaction starts on the foreign server.
+    <varname>serverid</varname> and <varname>userid</varname> are <type>OID</type>s
+    which specify the transaction starts on what server by who. <varname>fx_id</varname>
+    is null-terminated string which is an identifer of foreign transaction and it
+    will be passed when transaction management APIs is called. The length of
+    <varname>fx_id</varname> must be less than 200 bytes. Also this identifier
+    must be unique enough so that it doesn't conflict other concurrent foreign
+    transactions. <varname>fx_id</varname> can be <literal>NULL</literal>.
+    If it's <literal>NULL</literal>, a transaction identifier is automacitally
+    generated with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    Since this identifier is used per foreign transaction and the xid of unresolved
+    distributed transaction never reused, an auto-generated identifier is fairly
+    enough to ensure uniqueness. It's recommended to generate foreign transaction
+    identifier in FDW if the format of auto-generated identifier doesn't match
+    the requirement of the foreign server.
+    </para>
+
+     <para>
+      An example of such transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+     </para>
+
+     <para>
+      When a transaction starts on the foreign server, FDW that wishes atomic
+      commit must register the foreign transaction as a participant by calling
+      <function>FdwXactRegisterForeignTransaction</function>. Also during
+      transaction, <function>IsTwoPhaseCommitEnabled</function> is called whenever
+      the transaction begins to modify data on the foreign server. If FDW wishes
+      atomic commit <function>IsTwoPhaseCommitEnabled</function> must return
+      <literal>true</literal>. All foreign transaction participants must
+      return <literal>true</literal> to achieve atomic commit.
+     </para>
+
+     <para>
+      During pre-commit phase of local transaction, the foreign transaction manager
+      persists the foreign transaction information to the disk and WAL, and then
+      prepare all foreign transaction by calling <function>PrepareForeignTransaction</function>
+      if two-phase commit protocol is required. Two-phase commit is required only if
+      the transaction modified data on more than one servers including the local
+      server and user requests atomic commit. <productname>PostgreSQL</productname>
+      can commit locally and go to the next step if and only if all preparing foreign
+      transactions got successful. If two-phase commit is not required, the foreign
+      transaction manager commits a transaction on the foreign server by calling
+      <function>CommitForeignTransaction</function> and then
+      <productname>PostgreSQL</productname> commits locally. The foreign transaction
+      manager doesn't do any further change on foreign transactions from this point
+      forward. If any failure happens for whatever reason, for example a network
+      failure or user request until <productname>PostgreSQL</productname> commits
+      locally the foreign transaction manager changes over to rollback and calls
+      <function>RollbackForeignTransaction</function> for every foreign servers to
+      close the current transaction on foreign servers.
+     </para>
+
+     <para>
+      When two-phase commit is required, after committed locally, each the transaction
+      commits will wait for all prepared foreign transaction to be resolved before
+      the commit completes. The foreign transaction resolver is responsible for
+      foreign transaction resolution. <function>ResolverForeignTransaction</function>
+      is called by the foreign transaction resolver process when it resolves a foreign
+      transactions. <function>ResolveForeignTransaction</function> is also be called
+      when user execute <function>pg_resovle_fdw_xact</function> function.
+     </para>
+    </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9a7f683..8ed007c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20755,6 +20755,57 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-fdw-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_fdw_xacts</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_fdw_xacts</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function search for foreign transaction
+        matching the arguments and resolves then. This function won't resolve
+        a foreign transaction which is in progress, or one that is locked by some
+        other backend.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_fdw_xact</function>
+        except it remove foreign transaction entry without resolving.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa..635a5e7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_fdw_xact_resolver</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-resolver-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1194,6 +1202,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalLauncherMain</literal></entry>
          <entry>Waiting in main loop of logical launcher process.</entry>
         </row>
@@ -1405,6 +1421,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2214,6 +2234,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-resolver-view" xreflabel="pg_stat_fdw_xact_resolver">
+   <title><structname>pg_stat_fdw_xact_resolver</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..3705104
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdw_xactdesc.c
+ *		PostgreSQL distributed transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/backend/access/transam/fdw_xactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "foreign/fdwxact_xlog.h"
+
+void
+fdw_xact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		FdwXactOnDiskData *fdw_insert_xlog = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_insert_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_insert_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_insert_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_insert_xlog->local_xid);
+		/* TODO: This should be really interpreted by each FDW */
+
+		/*
+		 * TODO: we also need to assess whether we want to add this
+		 * information
+		 */
+		appendStringInfo(buf, " foreign transaction info: %s",
+						 fdw_insert_xlog->fdw_xact_id);
+	}
+	else
+	{
+		xl_fdw_xact_remove *fdw_remove_xlog = (xl_fdw_xact_remove *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_remove_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_remove_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_remove_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_remove_xlog->xid);
+	}
+
+}
+
+const char *
+fdw_xact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDW_XACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDW_XACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7..023a7c5 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,14 +112,16 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_prepared_xacts=%d max_locks_per_xact=%d "
 						 "wal_level=%s wal_log_hints=%s "
-						 "track_commit_timestamp=%s",
+						 "track_commit_timestamp=%s "
+						 "max_prepared_foreign_xacts=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..b5c3502 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,6 +24,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact.h"
 #include "replication/message.h"
 #include "replication/origin.h"
 #include "storage/standby.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3942734..839e768 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -89,6 +89,7 @@
 #include "access/xlogreader.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -844,6 +845,35 @@ TwoPhaseGetGXact(TransactionId xid)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyProc
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2316,6 +2346,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2375,6 +2411,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 875be18..c4c879d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "foreign/fdwxact.h"
 #include "libpq/be-fsstubs.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -1108,6 +1109,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_twophase_for_ac;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1116,6 +1118,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_twophase_for_ac = ForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1154,12 +1157,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_twophase_for_ac)
 			goto cleanup;
 	}
 	else
@@ -1317,6 +1321,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_twophase_for_ac && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -1955,6 +1967,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2110,6 +2125,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2197,6 +2213,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2385,6 +2403,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2589,6 +2608,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7375a78..2a168cd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -5267,6 +5268,7 @@ BootStrapXLOG(void)
 	ControlFile->MaxConnections = MaxConnections;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6354,6 +6356,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6878,14 +6883,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdw_xact, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7077,7 +7083,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7583,6 +7592,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7901,6 +7911,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9217,6 +9230,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9650,7 +9664,8 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9682,6 +9697,7 @@ XLogReportParameters(void)
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9887,6 +9903,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10085,6 +10102,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->MaxConnections = xlrec.MaxConnections;
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7251552..5fa6065 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_prepared_fdw_xacts AS
+       SELECT * FROM pg_prepared_fdw_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
 	l.objoid, l.classoid, l.objsubid,
@@ -773,6 +776,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_fdwxact_resolvers AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_fdwxact_resolver() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index e5dd995..50c31ef 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -28,6 +28,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
@@ -1093,6 +1094,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1407,6 +1420,16 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srv->serverid,
+						useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ec7a526..ea31749 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -19,6 +19,7 @@
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -744,7 +745,10 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+		FdwXactMarkForeignTransactionModified(partRelInfo, 0);
+	}
 
 	MemoryContextSwitchTo(oldContext);
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index a2a28b7..30a0b66 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,9 +22,11 @@
  */
 #include "postgres.h"
 
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -224,7 +226,13 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+
+		/* Mark this transaction modified data on the foreign server */
+		FdwXactMarkForeignTransactionModified(estate->es_result_relation_info,
+										 eflags);
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bf0d5e8..283bfaf 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -44,6 +44,8 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -2317,6 +2319,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 fdw_private,
 															 i,
 															 eflags);
+
+			/* Mark this transaction modified data on the foreign server */
+			FdwXactMarkForeignTransactionModified(resultRelInfo, eflags);
 		}
 
 		resultRelInfo++;
diff --git a/src/backend/foreign/Makefile b/src/backend/foreign/Makefile
index 85aa857..4329d3e 100644
--- a/src/backend/foreign/Makefile
+++ b/src/backend/foreign/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/foreign
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS= foreign.o
+OBJS= foreign.o fdwxact.o fdwxact_launcher.o fdwxact_resolver.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/foreign/fdwxact.c b/src/backend/foreign/fdwxact.c
new file mode 100755
index 0000000..d284861
--- /dev/null
+++ b/src/backend/foreign/fdwxact.c
@@ -0,0 +1,2762 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL distributed transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * When a foreign data wrapper starts transaction on a foreign server
+ * that is capable of two-phase commit protocol, it's required to register
+ * the foreign transaction using function FdwXactRegisterTransaction() in order
+ * to participate to a group for atomic commit. Participants are identified
+ * by oid of foreign server and user. When the foreign transaction begins
+ * to modify data it's required to mark it as modified using
+ * FdwXactMarkForeignTransactionModified()
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. After committing or rolling back locally, we
+ * notify the resolver process and tell it to commit or roll back those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish,
+ * and so that we're not trying to locally do work that might fail when an
+ * ERROR after already committed.
+ *
+ * Two-phase commit protocol is required if the transaction modified
+ * two or more servers including itself. In other case, all foreign transactions
+ * are committed during pre-commit.
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. dangling
+ * transaction). Dangling transactions are processed by the resolve process
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * 	* On PREPARE redo we add the foreign transaction to FdwXactCtl->fdw_xacts.
+ *	  We set fdw_xact->inredo to true for such entries.
+ *	* On Checkpoint redo, we iterate through FdwXactCtl->fdw_xacts entries that
+ *	  have set fdw_xact->inredo true and are behind the redo_horizon. We save
+ *    them to disk and then set fdw_xact->ondisk to true.
+ *	* On COMMIT and ABORT we delete the entry from FdwXactCtl->fdw_xacts.
+ *	  If fdw_xact->ondisk is true, we delete the corresponding file from
+ *	  the disk as well.
+ *  * RecoverFdwXacts loads all foreign transaction entries from disk into
+ *    memory at server startup.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/foreign/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/fdwxact_xlog.h"
+#include "foreign/resolver_internal.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Is atomic commit requested by user? */
+#define AtomicCommitRequested() \
+	(foreign_twophase_commit == true && \
+	 max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Structure to bundle the foreign transaction participant */
+typedef struct FdwXactParticipant
+{
+	Oid			serverid;
+	Oid			userid;
+
+	/*
+	 * Pointer to a FdwXact entry in global entry. NULL if
+	 * this foreign transaction is registered but not inserted
+	 * yet.
+	 */
+	FdwXact		fdw_xact;
+	char		*fdw_xact_id;
+
+	/* true if this transaction modified data on the foreign server */
+	bool		modified;
+
+	/*
+	 * This is initialized at foreign transaction registration and
+	 * passed to API functions.
+	 */
+	ForeignTransaction foreign_xact;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact;
+	CommitForeignTransaction_function	commit_foreign_xact;
+	RollbackForeignTransaction_function	rollback_foreign_xact;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit.
+ * This list has only foreign servers that are capable of two-phase
+ * commit protocol.
+ */
+List *FdwXactParticipantsForAC = NIL;
+
+/*
+ * This struct tracks all participants involved with transaction 'xid'.
+ */
+typedef struct FdwXactStateCacheEntry
+{
+	/* Key -- must be first */
+	TransactionId	xid;
+
+	/* List of FdwXacts involved with the xid */
+	FdwXact	participants;
+} FdwXactStateCacheEntry;
+static HTAB	*FdwXactStateCache;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDW_XACTS_DIR "pg_fdw_xact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDW_XACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDW_XACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+static FdwXact FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static bool FdwXactResolveForeignTransaction(FdwXact fdw_xact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactQueueInsert(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid, bool give_warnings);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool give_warnings);
+static List *get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						   bool need_lock);
+static FdwXact get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								bool need_lock);
+static FdwXact get_all_fdw_xacts(int *length);
+static FdwXact insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							   char *fdw_xact_id);
+static char *generate_fdw_xact_identifier(Oid serverid, Oid userid);
+static void remove_fdw_xact(FdwXact fdw_xact);
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+bool foreign_twophase_commit = false;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * This function aimes to be called by FDW when foreign transaction
+ * starts. The foreign server identified by given server id must
+ * support atomic commit APIs. The foreign transaction is identified
+ * by given identifier 'fdw_xact_id' which can be NULL. If it's NULL,
+ * we construct an unique identifer.
+ *
+ * After registered, foreign transaction of participants are managed
+ * by foreign transaction manager until the end of the distributed
+ * transaction.
+ */
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id)
+{
+	FdwXactParticipant	*fdw_part;
+	ListCell   			*lc;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	MemoryContext		old_context;
+
+	/* Check length of foreign transaction identifier */
+	if (fx_id != NULL && strlen(fx_id) >= NAMEDATALEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						fx_id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   NAMEDATALEN)));
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_xact_resolvers to a nonzero value.")));
+
+	/* Duplication check */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		/* Quick return if there is already registered connection */
+		if (fdw_part->serverid == serverid && fdw_part->userid == userid)
+			ereport(ERROR,
+					(errmsg("attempt to start transction again on server %u user %u",
+							serverid, userid)));
+	}
+
+	/*
+	 * Participants information is needed at the end of a transaction, when
+	 * system cache are not available. so save it in TopTransactionContext
+	 * before hand so that these can live until the end of transaction.
+	 */
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	/* Make sure that the FDW has transaction handlers */
+	if (!fdw_routine->PrepareForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function provided for preparing foreign transaction for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to commit a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->RollbackForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to rollback a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+
+	/* Generate foreign transaction identifier if not provided */
+	if (fx_id ==  NULL)
+		fx_id = generate_fdw_xact_identifier(serverid, userid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->serverid = serverid;
+	fdw_part->userid = userid;
+	fdw_part->fdw_xact_id = fx_id;
+	fdw_part->fdw_xact = NULL;
+	fdw_part->modified = false;	/* by default */
+	fdw_part->foreign_xact.server = foreign_server;
+	fdw_part->foreign_xact.usermapping = user_mapping;
+	fdw_part->foreign_xact.fx_id = fx_id;
+	fdw_part->prepare_foreign_xact = fdw_routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact = fdw_routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact = fdw_routine->RollbackForeignTransaction;
+
+	/* Add this foreign connection to the participants list */
+	FdwXactParticipantsForAC = lappend(FdwXactParticipantsForAC, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_context);
+
+	return;
+}
+
+/*
+ * Remember the registered foreign transaction modified data . This function
+ * is called when the executor begins to modify data on a foreign server
+ * regardless the foreign server is capable of two-phase commit protocol.
+ * Marking it will be used to determine we must use two-phase commit protocol
+ * at commit. This function also checks if the begin modified foreign server
+ * is capable of two-phase commit or not. If it doesn't support, we remember
+ * it.
+ */
+void
+FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo, int flags)
+{
+	Relation			rel = resultRelInfo->ri_RelationDesc;
+	FdwXactParticipant	*fdw_part;
+	ForeignTable		*ftable;
+	ListCell   			*lc;
+	Oid					userid;
+	Oid					serverid;
+
+	bool found = false;
+
+	/* Quick return if user not request */
+	if (!AtomicCommitRequested())
+		return;
+
+	/* Do nothing in EXPLAIN (no ANALYZE) case */
+	if (flags && EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	ftable = GetForeignTable(RelationGetRelid(rel));
+
+	/*
+	 * If the being modified foreign server doesn't or cannot enable
+	 * two-phase commit protocol, mark that we've written such server
+	 * and return.
+	 */
+	if (resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled == NULL ||
+		!resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled(ftable->serverid))
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		return;
+	}
+
+	/*
+	 * The foreign server being modified supports two-phase commit protocol,
+	 * remember that the foreign transaction modified data.
+	 */
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	serverid = ftable->serverid;
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		if (fdw_part->serverid == serverid && fdw_part->userid == userid)
+		{
+			fdw_part->modified = true;
+			found = true;
+			break;
+		}
+	}
+
+	if (!found)
+		elog(ERROR, "attempt to mark unregistered foreign server %u, user %u as modified",
+			 serverid, userid);
+}
+
+/*
+ * FdwXactShmemSize
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdw_xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	/* Size for shared cache entry */
+	size = MAXALIGN(size);
+	size = add_size(size, hash_estimate_size(max_prepared_foreign_xacts,
+											 sizeof(FdwXactStateCacheEntry)));
+
+	return size;
+}
+
+/*
+ * FdwXactShmemInit
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of
+ * FdwXactCtlData structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdw_xacts;
+		HASHCTL		info;
+		long		max_hash_size;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->freeFdwXacts = NULL;
+		FdwXactCtl->numFdwXacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdw_xacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdw_xacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdw_xacts[cnt].status = FDW_XACT_INITIAL;
+			fdw_xacts[cnt].fxact_free_next = FdwXactCtl->freeFdwXacts;
+			FdwXactCtl->freeFdwXacts = &fdw_xacts[cnt];
+		}
+
+		/* Initialize shared state cache hash table */
+		MemSet(&info, 0, sizeof(info));
+		info.keysize = sizeof(TransactionId);
+		info.entrysize = sizeof(FdwXactStateCacheEntry);
+		max_hash_size = max_prepared_foreign_xacts;
+
+		FdwXactStateCache = ShmemInitHash("FdwXact hash",
+										  max_hash_size,
+										  max_hash_size,
+										  &info,
+										  HASH_ELEM | HASH_BLOBS);
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * PreCommit_FdwXacts
+ *
+ * This function prepares all foreign transaction participants if atomic commit
+ * is required. Otherwise commits them without preparing.
+ *
+ * If atomic commit is requested by user (that is, foreign_twophase_commit is on),
+ * every participants must enable two-phase commit. If we manage all foreign
+ * transactions involving with a transaction we can commit foreign transactions
+ * on foreign server that doesn't use two-phase commit here and commit others
+ * at post-commit phase, but we don't do that. Because (1) it doesn't satisfy
+ * the atomic commit semantics at all and (2) it requires all FDWs to register
+ * foreign server anyway, which breaks backward compatibility.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * If user requires the atomic commit semantics, we don't allow COMMIT if we've
+	 * modified data on  foreign servers both that can execute two-phase commit
+	 * protocol and that cannot.
+	 */
+	if (foreign_twophase_commit == true && MyXactFlags & XACT_FLAGS_FDWNOPREPARE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	if (ForeignTwophaseCommitRequired())
+	{
+		/* Prepare the transactions on the all foreign servers */
+		FdwXactPrepareForeignTransactions();
+	}
+	else
+	{
+		ListCell   *lc;
+
+		Assert(list_length(FdwXactParticipantsForAC) == 1);
+
+		/* Two-phase commit is not required, commit them one by one */
+		foreach(lc, FdwXactParticipantsForAC)
+		{
+			FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			/* Commit foreign transaction */
+			if (!fdw_part->commit_foreign_xact(&fdw_part->foreign_xact))
+				ereport(ERROR,
+						(errmsg("could not commit foreign transaction on server %s",
+								fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* Forget all participants */
+		FdwXactParticipantsForAC = NIL;
+	}
+}
+
+/*
+ * FdwXactPrepareForeignTransactions
+ *
+ * Prepare all foreign transaction participants.  This function creates a prepared
+ * participants chain whenever we prepared a foreign transaction. The prepared
+ * participants chain is used to access all participants of distributed transaction
+ * quickly. If any one of them fails to prepare or raises an error, we change over
+ * to aborts.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell   *lcell;
+	FdwXact		prev_fxact = NULL;
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXact		fxact;
+
+		/*
+		 * Register the foreign transaction entry. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fxact = FdwXactRegisterFdwXactEntry(GetTopTransactionId(), fdw_part);
+
+		/*
+		 * Between FdwXactRegisterFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal). During abort processing,
+		 * we might try to resolve a never-prepared transaction, and get an error.
+		 * This is fine as long as the FDW provides us unique prepared transaction
+		 * identifiers.
+		 */
+		if (!fdw_part->prepare_foreign_xact(&fdw_part->foreign_xact))
+		{
+			/* Failed to prepare, change over aborts */
+			ereport(ERROR,
+					(errmsg("could not prepare transaction on foreign server %s",
+							fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* Preparation is success, update its status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdw_part->fdw_xact->status = FDW_XACT_PREPARED;
+		fdw_part->fdw_xact = fxact;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Create a prepared participants chain, which is link-ed FdwXact entries
+		 * involving with this transaction. The head entry is remembered in hash
+		 * table and subsequent entries is liked from the previous entry.
+		 */
+		if (!prev_fxact)
+		{
+			FdwXactStateCacheEntry	*fxact_entry;
+			bool				found;
+
+			LWLockAcquire(FdwXactLock,LW_EXCLUSIVE);
+			fxact_entry = (FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+																 (void *) &(fxact->local_xid),
+																 HASH_ENTER, &found);
+			LWLockRelease(FdwXactLock);
+			Assert(!found);
+
+			/* Set the first participant */
+			fxact_entry->participants = fxact;
+		}
+		else
+		{
+			/* Append others to the tail */
+			Assert(fxact->fxact_next == NULL);
+			prev_fxact->fxact_next = fxact;
+		}
+
+		prev_fxact = fxact;
+	}
+}
+
+/*
+ * FdwXactRegisterFdwXactEntry
+ *
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and will
+ * be persisted to the disk under pg_fdw_xact directory when checkpoint.
+ */
+static FdwXact
+FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fxact;
+	FdwXactOnDiskData	*fxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact = insert_fdw_xact(MyDatabaseId, xid, fdw_part->serverid,
+							fdw_part->userid, fdw_part->fdw_xact_id);
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->registered_backend = MyBackendId;
+	fdw_part->fdw_xact = fxact;
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdw_xact_id);
+	data_len = data_len + strlen(fdw_part->fdw_xact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fxact_file_data->dbid = MyDatabaseId;
+	fxact_file_data->local_xid = xid;
+	fxact_file_data->serverid = fdw_part->serverid;
+	fxact_file_data->userid = fdw_part->userid;
+	memcpy(fxact_file_data->fdw_xact_id, fdw_part->fdw_xact_id,
+		   strlen(fdw_part->fdw_xact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fxact_file_data, data_len);
+	fxact->insert_end_lsn = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_INSERT);
+	XLogFlush(fxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact->valid = true;
+	LWLockRelease(FdwXactLock);
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fxact_file_data);
+	return fxact;
+}
+
+/*
+ * insert_fdw_xact
+ *
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				char *fdw_xact_id)
+{
+	int i;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		fxact = FdwXactCtl->fdw_xacts[i];
+		if (fxact->dbid == dbid &&
+			fxact->local_xid == xid &&
+			fxact->serverid == serverid &&
+			fxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->freeFdwXacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fxact = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fxact->fxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->numFdwXacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts++] = fxact;
+
+	fxact->registered_backend = InvalidBackendId;
+	fxact->dbid = dbid;
+	fxact->local_xid = xid;
+	fxact->serverid = serverid;
+	fxact->userid = userid;
+	fxact->insert_start_lsn = InvalidXLogRecPtr;
+	fxact->insert_end_lsn = InvalidXLogRecPtr;
+	fxact->valid = false;
+	fxact->ondisk = false;
+	fxact->inredo = false;
+	memcpy(fxact->fdw_xact_id, fdw_xact_id, strlen(fdw_xact_id) + 1);
+
+	return fxact;
+}
+
+/*
+ * remove_fdw_xact
+ *
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdw_xact(FdwXact fdw_xact)
+{
+	int			cnt;
+
+	Assert(fdw_xact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		if (FdwXactCtl->fdw_xacts[cnt] == fdw_xact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (cnt >= FdwXactCtl->numFdwXacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdw_xact->local_xid, fdw_xact->serverid, fdw_xact->userid)));
+
+	/* Remove the entry from active array */
+	FdwXactCtl->numFdwXacts--;
+	FdwXactCtl->fdw_xacts[cnt] = FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts];
+
+	/* Put it back into free list */
+	fdw_xact->fxact_free_next = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fdw_xact;
+
+	/* Reset informations */
+	fdw_xact->status = FDW_XACT_INITIAL;
+	fdw_xact->registered_backend = InvalidBackendId;
+	fdw_xact->fxact_next = NULL;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdw_xact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdw_xact->serverid;
+		record.dbid = fdw_xact->dbid;
+		record.xid = fdw_xact->local_xid;
+		record.userid = fdw_xact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdw_xact_remove));
+		recptr = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true if the current transaction requires foreign two-phase commit
+ * to achieve atomic commit. Foreign two-phase commit is required if we
+ * satisfy either case: we modified data on two or more foreign server, or
+ * we modified both non-temporary relation on local and data on more than
+ * one foreign server.
+ */
+bool
+ForeignTwophaseCommitRequired(void)
+{
+	int	nserverswritten = list_length(FdwXactParticipantsForAC);
+	ListCell*	lc;
+	bool		modified = false;
+
+	/* Return if not requested */
+	if (!AtomicCommitRequested())
+		return false;
+
+	/* Check if we modified data on any foreign server */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+		{
+			modified = true;
+			break;
+		}
+	}
+
+	/* We didn't modify data on any foreign server */
+	if (!modified)
+		return false;
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	return nserverswritten > 1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * ForgetAllFdwXactParticipants
+ *
+ * Reset all the foreign transaction entries that this backend registered.
+ * If the foreign transaction has the corresponding FdwXact entry, resetting
+ * the registered_backend field means to leave that entry in unresolved state.
+ * If we leaves any entries, we update the oldest xmin of unresolved transaction
+ * so that transaction status of dangling transaction are not truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_left = 0;
+
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(cell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+
+		/* Skip if didn't register FdwXact entry yet */
+		if (fdw_part->fdw_xact == NULL)
+			continue;
+
+		/*
+		 * There is a race condition; the entries of FdwXactParticipantsForAC
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget them. So we need to check if the entries are
+		 * still associated with the transaction.
+		 */
+		if (fdw_part->fdw_xact->registered_backend == MyBackendId)
+		{
+			fdw_part->fdw_xact->registered_backend = InvalidBackendId;
+			n_left++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Update the oldest local transaction of unresolved distributed
+	 * transaction if we leaved any FdwXact entries.
+	 */
+	if (n_left > 0)
+		FdwXactComputeRequiredXmin();
+
+	FdwXactParticipantsForAC = NIL;
+}
+
+/*
+ * AtProcExit_FdwXact
+ *
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDW_XACT_NOT_WAITING and then change
+ * that state to FDW_XACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransactions a fdwxact resolver changes the
+ * state to FDW_XACT_WAIT_COMPLETE once foreign transactions are resolved.
+ * This backend then resets its state to FDW_XACT_NOT_WAITING.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+	ListCell	*lc;
+	List		*fdwxact_participants = NIL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!AtomicCommitRequested())
+		return;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDW_XACT_NOT_WAITING);
+
+	if (FdwXactParticipantsForAC != NIL)
+	{
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * we've prepared just before, use the participants list.
+		 */
+		Assert(MyPgXact->xid == wait_xid);
+		fdwxact_participants = FdwXactParticipantsForAC;
+	}
+	else
+	{
+		FdwXactStateCacheEntry *fdwxact_entry;
+		bool found;
+
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * is part of a local prepared transaction that is mark as
+		 * prepared during running, since these entries exist in the hash
+		 * table we construct the participants list from the entry.
+		 */
+		Assert(FdwXactStateCache);
+		fdwxact_entry = (FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+															   (void *) &wait_xid,
+															   HASH_FIND, &found);
+
+		if (found)
+		{
+			FdwXact	fdwxact;
+
+			for (fdwxact = fdwxact_entry->participants;
+				 fdwxact != NULL;
+				 fdwxact = fdwxact->fxact_next)
+				fdwxact_participants = lappend(fdwxact_participants, fdwxact);
+		}
+	}
+
+	/*
+	 * Otherwise, construct the participants list by scanning the global
+	 * array. This can happen in the case where we restarts after PREPARE'd
+	 * a distributed transaction and then are trying to resolve it.
+	 */
+	if (fdwxact_participants == NIL)
+		fdwxact_participants = get_fdw_xacts(MyDatabaseId, wait_xid,
+											 InvalidOid, InvalidOid, true);
+
+	/* Exit if we found no foreign transaction to resolve */
+	if (fdwxact_participants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(lc, fdwxact_participants)
+	{
+		FdwXact fdw_xact = (FdwXact) lfirst(lc);
+
+		/* Don't overwrite status if fate has been determined */
+		if (fdw_xact->status == FDW_XACT_PREPARED)
+			fdw_xact->status = (is_commit ?
+								FDW_XACT_COMMITTING_PREPARED :
+								FDW_XACT_ABORTING_PREPARED);
+	}
+
+	/* Set backend status and enqueue itself */
+	MyProc->fdwXactState = FDW_XACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	FdwXactQueueInsert();
+	LWLockRelease(FdwXactLock);
+
+	/* Launch a resolver process if not yet, or wake it up */
+	fdwxact_maybe_launch_resolver(false);
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDW_XACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDW_XACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDW_XACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+
+	/*
+	 * Forget the list of locked entries, also means that the entries
+	 * that could not resolved are remained as dangling transactions.
+	 */
+	ForgetAllFdwXactParticipants();
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Acquire FdwXactLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Insert MyProc into the tail of FdwXactQueue.
+ */
+static void
+FdwXactQueueInsert(void)
+{
+	SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactQueue),
+						 &(MyProc->fdwXactLinks));
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Create and initialize an FdwXactResolveState which is used
+ * for resolution of foreign transactions.
+ */
+FdwXactResolveState *
+CreateFdwXactResolveState(void)
+{
+	FdwXactResolveState *frstate = palloc0(sizeof(FdwXactResolveState));
+
+	frstate->dbid = MyDatabaseId;
+	frstate->fdwxact = NULL;
+	frstate->waiter = NULL;
+
+	return frstate;
+}
+
+/*
+ * Resolve one distributed transaction. The target distributed transaction
+ * is fetched from shmem queue and its participants are fetched from either
+ * shmem hash table or global array. Release the waiter and return true only
+ * if we resolved the all of the foreign transaction participants. Return
+ * false if we flied to resolve any of them.
+ *
+ * To ensure the order of registered distributed transaction to the queue, we
+ * must not go the next distributed transaction until all of participants are
+ * resolved. The failed foreign transactions will be retried at the next execution.
+ */
+bool
+FdwXactResolveDistributedTransaction(FdwXactResolveState *frstate)
+{
+	FdwXactStateCacheEntry	*fdwxact_entry = NULL;
+	volatile FdwXact	fdwxacts_failed_to_resolve = NULL;
+	bool				all_resolved = false;
+
+	Assert(frstate->dbid == MyDatabaseId);
+
+	/* Get a new waiter, if not exists */
+	if (frstate->waiter == NULL)
+	{
+		PGPROC	*proc;
+
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+		/* Fetch a waiter from beginning of the queue */
+		while ((proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->FdwXactQueue),
+											   &(FdwXactRslvCtl->FdwXactQueue),
+											   offsetof(PGPROC, fdwXactLinks))) != NULL)
+		{
+			/* Found a waiter */
+			if (proc->databaseId == frstate->dbid)
+				break;
+		}
+
+		LWLockRelease(FdwXactLock);
+
+		/* If no waiter, there is no job */
+		if (!proc)
+			return false;
+
+		Assert(TransactionIdIsValid(proc->fdwXactWaitXid));
+		frstate->waiter = proc;
+	}
+
+	/* Get foreign transaction participants */
+	if (frstate->fdwxact == NULL)
+	{
+		bool found;
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+		/* Search FdwXact entries from the hash table by the local transaction id */
+		fdwxact_entry =
+			(FdwXactStateCacheEntry *) hash_search(FdwXactStateCache,
+												   (void *) &(frstate->waiter->fdwXactWaitXid),
+												   HASH_FIND, &found);
+
+		if (found)
+			frstate->fdwxact = fdwxact_entry->participants;
+		else
+		{
+			int i;
+			FdwXact entries_to_resolve = NULL;
+			FdwXact prev_fx = NULL;
+
+			/*
+			 * The fdwxact entry doesn't exist in the hash table in case where
+			 * a prepared transaction is resolved after recovery. In this case,
+			 * we construct a list of fdw xact entries by scanning over the
+			 * FdwXactCtl->fdw_xacts list.
+			 */
+			for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+			{
+				FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+
+				if (fdw_xact->dbid == frstate->dbid &&
+					fdw_xact->local_xid == frstate->waiter->fdwXactWaitXid)
+				{
+					if (!entries_to_resolve)
+						entries_to_resolve = fdw_xact;
+
+					/* Link from previous entry to this entry */
+					if (prev_fx)
+						prev_fx->fxact_next = fdw_xact;
+
+					prev_fx = fdw_xact;
+				}
+			}
+
+			frstate->fdwxact = entries_to_resolve;
+		}
+
+		LWLockRelease(FdwXactLock);
+	}
+
+	Assert(frstate->fdwxact != NULL);
+
+	/* Resolve all foreign transactions one by one */
+	while (frstate->fdwxact != NULL)
+	{
+		volatile FdwXact cur_fdwxact = frstate->fdwxact;
+		volatile FdwXact fdwxact_next = NULL;
+
+		/*
+		 * Remember the next FdwXact entry to resolve as the current entry will
+		 * be removed after resolved from the list.
+		 */
+		fdwxact_next = cur_fdwxact->fxact_next;
+
+		/* Resolve a foreign transaction */
+		if (!FdwXactResolveForeignTransaction(cur_fdwxact))
+		{
+			ForeignServer *fserver;
+
+			CHECK_FOR_INTERRUPTS();
+
+			/* Failed to resolve. Remember it for the next execution */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (fdwxacts_failed_to_resolve == NULL)
+			{
+				/*
+				 * For the first failed entry, reset its next pointer
+				 * and append it to the head of list.
+				 */
+				cur_fdwxact->fxact_next = NULL;
+				fdwxacts_failed_to_resolve = cur_fdwxact;
+			}
+			else
+			{
+				FdwXact fx = fdwxacts_failed_to_resolve;
+
+				/* Append the entry at the tail */
+				while (fx->fxact_next != NULL)
+					fx = fx->fxact_next;
+				fx->fxact_next = cur_fdwxact;
+			}
+			LWLockRelease(FdwXactLock);
+
+			fserver = GetForeignServer(cur_fdwxact->serverid);
+			ereport(LOG,
+					(errmsg("could not resolve a foreign transaction on server \"%s\"",
+							fserver->servername),
+					 errdetail("local transaction id is %u, connected by user id %u",
+							   cur_fdwxact->local_xid, cur_fdwxact->userid)));
+		}
+		else
+		{
+			/* Resolved. Update the cache entry if it's valid */
+			if (fdwxact_entry)
+				fdwxact_entry->participants = fdwxact_next;
+
+			elog(DEBUG2, "resolved a foreign transaction xid %u, serverid %d, userid %d",
+				 cur_fdwxact->local_xid, cur_fdwxact->serverid, cur_fdwxact->userid);
+		}
+
+		/* Advance the resolution status to the next */
+		frstate->fdwxact = fdwxact_next;
+	}
+
+	all_resolved = (fdwxacts_failed_to_resolve == NULL);
+
+	if (all_resolved)
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+		/* Remove the state cache entry from shmem hash table */
+		hash_search(FdwXactStateCache, (void *) &(frstate->waiter->fdwXactWaitXid),
+					HASH_REMOVE, NULL);
+
+		/*
+		 * Remove waiter from shmem queue, if not detached yet. The waiter
+		 * could already be detached if user cancelled to wait before
+		 * resolution.
+		 */
+		if (!SHMQueueIsDetached(&(frstate->waiter->fdwXactLinks)))
+		{
+			TransactionId	wait_xid = frstate->waiter->fdwXactWaitXid;
+
+			SHMQueueDelete(&(frstate->waiter->fdwXactLinks));
+
+			pg_write_barrier();
+
+			/* Set state to complete */
+			frstate->waiter->fdwXactState = FDW_XACT_WAIT_COMPLETE;
+
+			/* Wake up the waiter only when we have set state and removed from queue */
+			SetLatch(&(frstate->waiter->procLatch));
+
+			elog(DEBUG2, "released a proc xid %u", wait_xid);
+		}
+
+		LWLockRelease(FdwXactLock);
+
+		/* Reset resolution state */
+		frstate->waiter = NULL;
+		Assert(frstate->fdwxact == NULL);
+	}
+	else
+	{
+		/*
+		 * Update the fdwxact entry we're processing so that the failed
+		 * fdwxact entries will be processed again.
+		 */
+		frstate->fdwxact = fdwxacts_failed_to_resolve;
+	}
+
+	return all_resolved;
+}
+
+/*
+ * Resolve all dangling foreign transactions on the given database. Get
+ * all dangling foreign transactions from shmem global array and resolve
+ * them one by one.
+ *
+ * Unlike FdwXactResolveDistributedTransaction, for dangling transaction
+ * resolution, we don't bother the order of resolution because these entries
+ * already got out of order. So if failed to resolve a foreign transaction,
+ * we can go to the next foreign transaction that might associates with
+ * an another distributed transaction.
+ */
+void
+FdwXactResolveAllDanglingTransactions(Oid dbid)
+{
+	List		*dangling_fdwxacts = NIL;
+	ListCell	*cell;
+	bool		n_resolved = 0;
+	int			i;
+
+	Assert(OidIsValid(dbid));
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/*
+	 * Walk over the global array to make the list of dangling transactions
+	 * of which corresponding local transaction is on the given database.
+	 */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fxact = FdwXactCtl->fdw_xacts[i];
+
+		/*
+		 * Append the fdwxact entry on the given database to the list if
+		 * it's handled by nobody and the corresponding local transaction
+		 * is not part of the prepared transaction.
+		 */
+		if (fxact->dbid == dbid &&
+			fxact->registered_backend == InvalidBackendId &&
+			!TwoPhaseExists(fxact->local_xid))
+			dangling_fdwxacts = lappend(dangling_fdwxacts, fxact);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/* Return if there is no foreign transaction we need to resolve */
+	if (dangling_fdwxacts == NIL)
+		return;
+
+	foreach(cell, dangling_fdwxacts)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(cell);
+
+		if (!FdwXactResolveForeignTransaction(fdwxact))
+		{
+			ForeignServer *fserver = GetForeignServer(fdwxact->serverid);
+
+			/*
+			 * If failed to resolve this foreign transaction we skip it in
+			 * this resolution cycle. Try to resolve again in next cycle.
+			 */
+			ereport(LOG,
+					(errmsg("could not resolve a dangling foreign transaction on server \"%s\"",
+							fserver->servername),
+					 errdetail("local transaction id is %u, connected by user id %u",
+							   fdwxact->local_xid, fdwxact->userid)));
+			continue;
+		}
+
+		n_resolved++;
+	}
+
+	list_free(dangling_fdwxacts);
+
+	elog(DEBUG2, "resolved %d dangling foreign xacts", n_resolved);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ *
+ * In commit case, we have already prepared transactions on the foreign
+ * servers during pre-commit. And that prepared transactions will be
+ * resolved by the resolver process. So we don't do anything about the
+ * foreign transaction.
+ *
+ * In abort case, user requested rollback or we changed over rollback
+ * due to error during commit. To close current foreign transaction anyway
+ * we call rollback API to every foreign transaction. If we raised an error
+ * during preparing and came to here, it's possible that some entries of
+ * FdwXactParticipants already registered its FdwXact entry. If there is
+ * we leave them as dangling transaction and ask the resolver process to
+ * process them.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		int left_fdwxacts = 0;
+
+		foreach (lcell, FdwXactParticipantsForAC)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * Count FdwXact entries that we registered to shared memory array
+			 * in this transaction.
+			 */
+			if (fdw_part->fdw_xact)
+			{
+				/*
+				 * The status of foreign transaction must be either preparing
+				 * or prepared. In any case, since we have registered FdwXact
+				 * entry we leave them to the resolver process. For the preparing
+				 * state, since the foreign transaction might not close yet we
+				 * fall through and call rollback API. For the prepared state,
+				 * since the foreign transaction has closed we don't need to do
+				 * anything.
+				 */
+				Assert(fdw_part->fdw_xact->status == FDW_XACT_PREPARING ||
+					   fdw_part->fdw_xact->status == FDW_XACT_PREPARED);
+
+				left_fdwxacts++;
+				if (fdw_part->fdw_xact->status == FDW_XACT_PREPARED)
+					continue;
+			}
+
+			/*
+			 * Rollback all current foreign transaction. Since we're rollbacking
+			 * the transaction it's too late even if we raise an error here.
+			 * So we log it as warning.
+			 */
+			if (!fdw_part->rollback_foreign_xact(&fdw_part->foreign_xact))
+				ereport(WARNING,
+						(errmsg("could not abort transaction on server \"%s\"",
+								fdw_part->foreign_xact.server->servername)));
+		}
+
+		/* If we left some FdwXact entries, ask the resolver process */
+		if (left_fdwxacts > 0)
+		{
+			ereport(WARNING,
+					(errmsg("left %u foreign transactions in in-doubt status",
+							left_fdwxacts)));
+			fdwxact_maybe_launch_resolver(true);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * AtPrepare_FdwXacts
+ *
+ * If there are foreign servers involved in the transaction, this function
+ * prepares transactions on those servers.
+ *
+ * Note that it can happen that the transaction aborts after we prepared part
+ * of participants. In this case since we can change to abort we cannot forget
+ * FdwXactParticipantsForAC here. These are processed by the resolver process
+ * during aborting, or at EOXact_FdwXacts.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * We cannot prepare distributed transaction if any foreign server of
+	 * participants in the transaction isn't capable of two-phase commit.
+	 */
+	if ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION),
+				 errmsg("can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * FdwXactResolveForeignTransaction
+ *
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * handler routine. The foreign transaction can be a dangling transaction
+ * that is not interested by nobody. If the fate of foreign transaction is
+ * not determined yet, it'sdetermined according to the status of corresponding
+ * local transaction.
+ *
+ * If the resolution is successful, remove the foreign transaction entry from
+ * the shared memory and also remove the corresponding on-disk file.
+ */
+static bool
+FdwXactResolveForeignTransaction(FdwXact fdwxact)
+{
+	bool		resolved;
+	bool		is_commit;
+	ForeignServer		*fserver;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	ForeignTransaction	foreign_xact;
+
+	Assert(fdwxact);
+
+	/*
+	 * Determine whether we commit or abort this foreign transaction.
+	 */
+	if (fdwxact->status == FDW_XACT_COMMITTING_PREPARED)
+		is_commit = true;
+	else if (fdwxact->status == FDW_XACT_ABORTING_PREPARED)
+		is_commit = false;
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	else if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_COMMITTING_PREPARED;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_ABORTING_PREPARED;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		is_commit = false;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction
+	 * state is neither committing or aborting. This should not
+	 * happen because we cannot determine to do commit or abort for
+	 * foreign transaction associated with the in-progress local
+	 * transaction.
+	 */
+	else
+		ereport(ERROR,
+				(errmsg("cannot resolve foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid)));
+
+	/* Construct foreign server connection information for passing to API */
+	fserver = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(fserver->fdwid);
+	user_mapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	foreign_xact.server = fserver;
+	foreign_xact.usermapping = user_mapping;
+	foreign_xact.fx_id = fdwxact->fdw_xact_id;
+
+	/* Resolve the foreign transaction */
+	Assert(fdw_routine->ResolveForeignTransaction);
+	resolved = fdw_routine->ResolveForeignTransaction(&foreign_xact,
+													  is_commit);
+
+	if (!resolved)
+	{
+		ForeignServer *fserver = GetForeignServer(fdwxact->serverid);
+		ereport(ERROR,
+				(errmsg("could not %s a prepared foreign transaction on server \"%s\"",
+						is_commit ? "commit" : "rollback", fserver->servername),
+				 errdetail("local transaction id is %u, connected by user id %u",
+						   fdwxact->local_xid, fdwxact->userid)));
+	}
+	else
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdw_xact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	return resolved;
+}
+
+/*
+ * Return one FdwXact entry that matches to given arguments, otherwise
+ * return NULL. Since this function search FdwXact entry by unique key
+ * all arguments should be valid.
+ */
+static FdwXact
+get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				 bool need_lock)
+{
+	List	*fdw_xact_list;
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, need_lock);
+
+	/* Could not find entry */
+	if (fdw_xact_list == NIL)
+		return NULL;
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdw_xact_list) == 1);
+
+	return (FdwXact) linitial(fdw_xact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdw_xact_exists(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdw_xact_list;
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, true);
+
+	return fdw_xact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_prepared_fdw_xacts.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdw_xacts(int *length)
+{
+	List		*all_fdw_xacts;
+	ListCell	*lc;
+	FdwXact		fdw_xacts;
+	int			num_fdw_xacts = 0;
+
+	Assert(length != NULL);
+
+	/* Get all entries */
+	all_fdw_xacts = get_fdw_xacts(InvalidOid, InvalidTransactionId,
+								  InvalidOid, InvalidOid, true);
+
+	if (all_fdw_xacts == NIL)
+	{
+		*length = 0;
+		return NULL;
+	}
+
+	fdw_xacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdw_xacts));
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdw_xacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdw_xacts + num_fdw_xacts, fx,
+			   sizeof(FdwXactData));
+		num_fdw_xacts++;
+	}
+
+	*length = num_fdw_xacts;
+	list_free(all_fdw_xacts);
+
+	return fdw_xacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return
+ * NIL.
+ */
+static List*
+get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			  bool need_lock)
+{
+	int i;
+	List	*fdw_xact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact	fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool	matches = true;
+
+		/* xid */
+		if (xid != InvalidTransactionId && xid != fdw_xact->local_xid)
+			matches = false;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdw_xact->dbid != dbid)
+			matches = false;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdw_xact->serverid)
+			matches = false;
+
+		/* userid */
+		if (OidIsValid(userid) && fdw_xact->userid != userid)
+			matches = false;
+
+		/* Append it if matched */
+		if (matches)
+			fdw_xact_list = lappend(fdw_xact_list, fdw_xact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdw_xact_list;
+}
+
+/*
+ * fdw_xact_redo
+ * Apply the redo log for a foreign transaction.
+ */
+void
+fdw_xact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDW_XACT_REMOVE)
+	{
+		xl_fdw_xact_remove *record = (xl_fdw_xact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. Returned string
+ * value is used to identify foreign transaction. The identifier should not
+ * be same as any other concurrent prepared transaction identifier.
+ *
+ * To make the foreign transactionid, we should ideally use something like
+ * UUID, which gives unique ids with high probability, but that may be expensive
+ * here and UUID extension which provides the function to generate UUID is
+ * not part of the core code.
+ */
+static char *
+generate_fdw_xact_identifier(Oid serverid, Oid userid)
+{
+	char*	fdw_xact_id;
+
+	fdw_xact_id = (char *)palloc(FDW_XACT_ID_MAX_LEN * sizeof(char));
+
+	snprintf(fdw_xact_id, FDW_XACT_ID_MAX_LEN, "%s_%ld_%d_%d",
+			 "fx", Abs(random()), serverid, userid);
+	fdw_xact_id[strlen(fdw_xact_id)] = '\0';
+
+	return fdw_xact_id;
+}
+
+/*
+ * CheckPointFdwXact
+ *
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * In order to avoid disk I/O while holding a light weight lock, the function
+ * first collects the files which need to be synced under FdwXactLock and then
+ * syncs them after releasing the lock. This approach creates a race condition:
+ * after releasing the lock, and before syncing a file, the corresponding
+ * foreign transaction entry and hence the file might get removed. The function
+ * checks whether that's true and ignores the error if so.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdw_xacts = 0;
+
+	/* Quick get-away, before taking lock */
+	if (max_prepared_foreign_xacts <= 0)
+		return;
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Another quick, before we allocate memory */
+	if (FdwXactCtl->numFdwXacts <= 0)
+	{
+		LWLockRelease(FdwXactLock);
+		return;
+	}
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		FdwXact		fxact = FdwXactCtl->fdw_xacts[cnt];
+
+		if ((fxact->valid || fxact->inredo) &&
+			!fxact->ondisk &&
+			fxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fxact->dbid, fxact->local_xid,
+								fxact->serverid, fxact->userid,
+								buf, len);
+			fxact->ondisk = true;
+			fxact->insert_start_lsn = InvalidXLogRecPtr;
+			fxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdw_xacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDW_XACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdw_xacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdw_xacts,
+							 serialized_fdw_xacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDW_XACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDW_XACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * ProcessFdwXactBuffer
+ *
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid = ShmemVariableCache->nextXid;
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(insert_start_lsn != InvalidXLogRecPtr);
+
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid, true);
+		if (buf == NULL)
+		{
+			ereport(WARNING,
+					(errmsg("removing corrupt fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+			return NULL;
+		}
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return thecontents in
+ * a structure allocated in-memory. Otherwise return NULL. The structure can
+ * be later freed by the caller.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				bool give_warnings)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+	{
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not stat FDW transaction state file \"%s\": %m",
+							path)));
+		return NULL;
+	}
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdw_xact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+	{
+		CloseTransientFile(fd);
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+		return NULL;
+	}
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+	{
+		CloseTransientFile(fd);
+		return NULL;
+	}
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_READ);
+	if (read(fd, buf, stat.st_size) != stat.st_size)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read FDW transaction state file \"%s\": %m",
+					  path)));
+		return NULL;
+	}
+
+	pgstat_report_wait_end();
+	CloseTransientFile(fd);
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+	{
+		pfree(buf);
+		return NULL;
+	}
+
+	/* Check if the contents is an expected data */
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fxact_file_data->dbid  != dbid ||
+		fxact_file_data->serverid != serverid ||
+		fxact_file_data->userid != userid ||
+		fxact_file_data->local_xid != xid)
+	{
+		ereport(WARNING,
+			(errmsg("invalid foreign transaction state file \"%s\"",
+					path)));
+		CloseTransientFile(fd);
+		pfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/*
+ * PrescanFdwXacts
+ *
+ * Scan the all foreign transactions directory for oldest active transaction.
+ * This is run during database startup, after we completed reading WAL.
+ * ShmemVariableCache->nextXid has been set to one more than the highest XID
+ * for which evidence exists in WAL.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	TransactionId nextXid = ShmemVariableCache->nextXid;
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+		 strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			TransactionId local_xid;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/*
+			 * Remove a foreign prepared transaction file corresponding to an
+			 * XID, which is too new.
+			 */
+			if (TransactionIdFollowsOrEquals(local_xid, nextXid))
+			{
+				ereport(WARNING,
+						(errmsg("removing future foreign prepared transaction file \"%s\"",
+								clde->d_name)));
+				RemoveFdwXactFile(dbid, local_xid, serverid, userid, true);
+				continue;
+			}
+
+			if (TransactionIdPrecedesOrEquals(local_xid, oldestActiveXid))
+				oldestActiveXid = local_xid;
+		}
+	}
+
+	FreeDir(cldir);
+	return oldestActiveXid;
+}
+
+/*
+ * restoreFdwXactData
+ *
+ * Scan pg_fdw_xact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * FdwXactRedoAdd
+ *
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fxact = insert_fdw_xact(fxact_data->dbid, fxact_data->local_xid,
+							fxact_data->serverid, fxact_data->userid,
+							fxact_data->fdw_xact_id);
+
+	/*
+	 * Set status as preparing, since we do not know the xact status
+	 * right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->insert_start_lsn = start_lsn;
+	fxact->insert_end_lsn = end_lsn;
+	fxact->inredo = true;	/* added in redo */
+	fxact->valid = false;
+	fxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * FdwXactRedoRemove
+ *
+ * Remove the corresponding fdw_xact entry from FdwXactCtl.
+ * Also remove fdw_xact file if a foreign transaction was saved
+ * via an earlier checkpoint.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdw_xact(dbid, xid, serverid, userid,
+							   false);
+
+	if (fdwxact == NULL)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdw_xact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(bool *newval, void **extra, GucSource source)
+{
+	/* Parameter check */
+	if (*newval &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_xacts\" or \"max_foreign_xact_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdw_xacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_prepared_fdw_xacts(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdw_xacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdw_xacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(6, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdw_xacts = get_all_fdw_xacts(&num_fdw_xacts);
+		status->num_xacts = num_fdw_xacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdw_xact = &status->fdw_xacts[status->cur_xact++];
+		Datum		values[6];
+		bool		nulls[6];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdw_xact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdw_xact->dbid);
+		values[1] = TransactionIdGetDatum(fdw_xact->local_xid);
+		values[2] = ObjectIdGetDatum(fdw_xact->serverid);
+		values[3] = ObjectIdGetDatum(fdw_xact->userid);
+		switch (fdw_xact->status)
+		{
+			case FDW_XACT_PREPARING:
+				xact_status = "prepared";
+				break;
+			case FDW_XACT_COMMITTING_PREPARED:
+				xact_status = "committing";
+				break;
+			case FDW_XACT_ABORTING_PREPARED:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		/* should this be really interpreted by FDW */
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdw_xact->fdw_xact_id,
+															 strlen(fdw_xact->fdw_xact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+	bool			ret;
+
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, true);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	ret = FdwXactResolveForeignTransaction(fdwxact);
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, false);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	remove_fdw_xact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/foreign/fdwxact_launcher.c b/src/backend/foreign/fdwxact_launcher.c
new file mode 100644
index 0000000..6782c33
--- /dev/null
+++ b/src/backend/foreign/fdwxact_launcher.c
@@ -0,0 +1,587 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * There is a shared memory area where the information of resolver process
+ * is stored. Requesting of starting new resolver process by backend process
+ * is done via that shared memory area. Note that the launcher is assuming
+ * that there is no more than one starting request for a database.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/foreign/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/resolver_internal.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid, int slot);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+Datum pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS);
+
+/*
+ * Wake up the launcher process.
+ */
+void
+FdwXactLauncherWakeup(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR1);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactQueue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+
+		now = GetCurrentTimestamp();
+
+		if (TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			bool launched;
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+				last_start_time = now;
+		}
+		else
+		{
+			/*
+			 * The wint in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver worker
+ * if not running yet. A foreign transaction resolver worker is responsible
+ * for resolution of foreign transaction that are registered on a database.
+ * So if a resolver worker already is launched, we don't need to launch new
+ * one.
+ */
+void
+fdwxact_maybe_launch_resolver(bool ignore_error)
+{
+	FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->pid != InvalidPid &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * If we found the resolver for my database, we don't need to launch new
+	 * one but wake running worker up.
+	 */
+	if (found)
+	{
+		SetLatch(resolver->latch);
+
+		elog(DEBUG1, "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		return;
+	}
+
+	/* Looking for unused worker slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	/*
+	 * However if there are no more free worker slots, inform user about it before
+	 * exiting.
+	 */
+	if (!found)
+	{
+		LWLockRelease(FdwXactResolverLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+		return;
+	}
+
+	Assert(resolver->pid == InvalidPid);
+
+	/* Found a new resolver process */
+	resolver->dbid = MyDatabaseId;
+	resolver->in_use = true;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Wake up launcher */
+	FdwXactLauncherWakeup();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' at 'slot' if given. If slot is negative value we find an unused slot.
+ * Note that caller must hold FdwXactResolverLock in exclusive mode.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid, int slot)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int launch_slot = slot;
+
+	/* If slot number is invalid, we find an unused slot */
+	if (launch_slot < 0)
+	{
+		int i;
+
+		for (i = 0; i < max_foreign_xact_resolvers; i++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+			if (resolver->in_use && resolver->dbid == dbid)
+				return;
+
+			if (!resolver->in_use)
+			{
+				launch_slot = i;
+				break;
+			}
+		}
+	}
+
+	/* No unused found */
+	if (launch_slot < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[launch_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_main_arg = Int32GetDatum(launch_slot);
+	bgw.bgw_notify_pid = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch all foreign transaction resolvers that are required by backend process
+ * but not running.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	int i, j;
+	int num_launches = 0;
+	int num_unused_slots = 0;
+	int num_dbs = 0;
+	bool launched = false;
+	Oid	*dbs_to_launch;
+	Oid *dbs_having_worker = palloc0(sizeof(Oid) * max_foreign_xact_resolvers);
+
+	/*
+	 * Launch resolver workers on the databases that are requested
+	 * by backend processes.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* Remember unused worker slots */
+		if (!resolver->in_use)
+			num_unused_slots++;
+
+		/* Remember databases that are having a resolve worker */
+		if (OidIsValid(resolver->dbid))
+			dbs_having_worker[num_dbs++] = resolver->dbid;
+
+		/* Launch new foreign transaction resolver worker on the database */
+		if (resolver->in_use &&
+			OidIsValid(resolver->dbid) &&
+			resolver->pid == InvalidPid)
+		{
+			fdwxact_launch_resolver(resolver->dbid, i);
+			launched = true;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* There is no unused slot, exit */
+	if (num_unused_slots == 0)
+		return launched;
+
+	dbs_to_launch = (Oid *) palloc(sizeof(Oid) * num_unused_slots);
+
+	/*
+	 * If there is unused slot, we can launch foreign transaction resolver
+	 * on databases that has unresolved foreign transaction but doesn't
+	 * have any resolver. This usually happens when resolvers crash for
+	 * whatever reason. Scanning all FdwXact entries could takes time but
+	 * since this is a relaunch case it's not harmless.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool found = false;
+
+		if (num_launches > num_unused_slots)
+			break;
+
+		for (j = 0; j < num_dbs; j++)
+		{
+			if (dbs_having_worker[j] == fdw_xact->dbid)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		if (found)
+			continue;
+
+		dbs_to_launch[num_launches++] = fdw_xact->dbid;
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* Launch resolver process for a database at any worker slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < num_launches; i++)
+	{
+		fdwxact_launch_resolver(dbs_to_launch[i], -1);
+		launched = true;
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+
+/*
+ * Returns activity of foreign transaction resolvers, including pids, the number
+ * of tasks and the last resolution time.
+ */
+Datum
+pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		if (resolver->pid == 0)
+		{
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/foreign/fdwxact_resolver.c b/src/backend/foreign/fdwxact_resolver.c
new file mode 100644
index 0000000..7f7ff8f
--- /dev/null
+++ b/src/backend/foreign/fdwxact_resolver.c
@@ -0,0 +1,310 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for every databases.
+ *
+ * A resolver process continues to resolve foreign transactions on a database
+ * It resolves two types of foreign transactions: on-line foreign transaction
+ * and dangling foreign transaction. The on-line foreign transaction is a
+ * foreign transaction that a concurrent backend process is waiting for
+ * resolution. The dangling transaction is a foreign transaction that corresponding
+ * distributed transaction ended up in in-doubt state. A resolver process
+ * doesn' exit as long as there is at least one unresolved foreign transaction
+ * on the database even if the timeout has come.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/foreign/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "foreign/fdwxact.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
+#include "foreign/resolver_internal.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+
+//static MemoryContext ResolveContext = NULL;
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FdwXactRslvLoop(void);
+static long FdwXactRslvComputeSleepTime(TimestampTz now);
+static void FdwXactRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+	FdwXactLauncherWakeup();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	TIMESTAMP_NOBEGIN(MyFdwXactResolver->last_resolved_time);
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactRslvLoop(void)
+{
+	FdwXactResolveState *fstate;
+
+	/* Create an FdwXactResolveState */
+	fstate = CreateFdwXactResolveState();
+
+	/* Enter main loop */
+	for (;;)
+	{
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time;
+		bool		resolved;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve a distributed transaction */
+		StartTransactionCommand();
+		resolved = FdwXactResolveDistributedTransaction(fstate);
+		CommitTransactionCommand();
+
+		now = GetCurrentTimestamp();
+
+		/* Update my state */
+		if (resolved)
+			MyFdwXactResolver->last_resolved_time = now;
+
+		/* Check for fdwxact resolver timeout */
+		FdwXactRslvCheckTimeout(now);
+
+		/*
+		 * If we have resolved any distributed transaction we go the next
+		 * without both resolving dangling transaction and sleeping because
+		 * there might be other on-line transactions waiting to be resolved.
+		 */
+		if (!resolved)
+		{
+			/* Resolve dangling transactions as mush as possible */
+			StartTransactionCommand();
+			FdwXactResolveAllDanglingTransactions(MyDatabaseId);
+			CommitTransactionCommand();
+
+			sleep_time = FdwXactRslvComputeSleepTime(now);
+
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   sleep_time,
+						   WAIT_EVENT_FDW_XACT_RESOLVER_MAIN);
+
+			if (rc & WL_POSTMASTER_DEATH)
+				proc_exit(1);
+		}
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/*
+	 * Reached to the timeout. We exit if there is no more both pending on-line
+	 * transactions and dangling transactions.
+	 */
+	if (!fdw_xact_exists(InvalidTransactionId, MyDatabaseId, InvalidOid,
+						 InvalidOid))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyFdwXactResolver->dbid))));
+		CommitTransactionCommand();
+
+		fdwxact_resolver_detach();
+		proc_exit(0);
+	}
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. Return the sleep time
+ * in milliseconds, -1 means that we reached to the timeout and should exits
+ */
+static long
+FdwXactRslvComputeSleepTime(TimestampTz now)
+{
+	static TimestampTz	wakeuptime = 0;
+	long	sleeptime;
+	long	sec_to_timeout;
+	int		microsec_to_timeout;
+
+	if (now >= wakeuptime)
+		wakeuptime = TimestampTzPlusMilliseconds(now,
+												 foreign_xact_resolution_retry_interval);
+
+	/* Compute relative time until wakeup. */
+	TimestampDifference(now, wakeuptime,
+						&sec_to_timeout, &microsec_to_timeout);
+
+	sleeptime = sec_to_timeout * 1000 + microsec_to_timeout / 1000;
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index eac78a5..1873a24 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -155,6 +155,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d2b695e..9243686 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -16,6 +16,8 @@
 
 #include "libpq/pqsignal.h"
 #include "access/parallel.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8a5b2b3..a67b34d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3492,6 +3492,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
 			event_name = "LogicalLauncherMain";
 			break;
@@ -3683,6 +3689,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3898,6 +3907,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDW_XACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 41de140..138dae4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -100,6 +100,8 @@
 #include "common/file_perm.h"
 #include "common/ip.h"
 #include "common/string.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "lib/ilist.h"
 #include "libpq/auth.h"
 #include "libpq/libpq.h"
@@ -905,6 +907,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires maX_foreign_xact_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -980,12 +986,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afb4972..960fd6a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDW_XACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..5f321fe 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -22,6 +22,7 @@
 #include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "foreign/fdwxact_launcher.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -150,6 +151,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, BackendRandomShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +273,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	BackendRandomShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf2f4db..461ba5c 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -90,6 +90,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -245,6 +247,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1312,6 +1315,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
 	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	volatile TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1373,6 +1377,7 @@ GetOldestXmin(Relation rel, int flags)
 	/* fetch into volatile var while ProcArrayLock is held */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1423,6 +1428,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDW_XACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -2999,6 +3013,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..a42d06e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,5 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+FdwXactLock					46
+FdwXactResolverLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa5..ec09515 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "foreign/fdwxact.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -398,6 +399,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -799,6 +804,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e4c6e3d..f09955f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -43,6 +43,8 @@
 #include "commands/async.h"
 #include "commands/prepare.h"
 #include "executor/spi.h"
+#include "foreign/fdwxact_resolver.h"
+#include "foreign/fdwxact_launcher.h"
 #include "jit/jit.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -2971,6 +2973,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0bec391..121d7bf 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -42,6 +42,7 @@
 #include "commands/variable.h"
 #include "commands/trigger.h"
 #include "common/string.h"
+#include "foreign/fdwxact.h"
 #include "funcapi.h"
 #include "jit/jit.h"
 #include "libpq/auth.h"
@@ -660,6 +661,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -1832,6 +1837,16 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+			gettext_noop("Sets the usage of two-phase commit protocol for distributed transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		false,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -2236,6 +2251,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4e61bc6..88cdc85 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -121,6 +121,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -287,6 +289,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index ad06e8e..ca3eb62 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ab5cb7f..609578c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -209,6 +209,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdw_xact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 895a51f..5f0683d 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -306,6 +306,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_worker_processes);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_xacts setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 6fb403a..6d867c8 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -730,6 +730,7 @@ GuessControlValues(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -957,6 +958,7 @@ RewriteControlFile(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* Contents are protected with a CRC */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..15bfeb4 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -26,6 +26,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "foreign/fdwxact_xlog.h"
 #include "replication/message.h"
 #include "replication/origin.h"
 #include "rmgrdesc.h"
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 0bbe9879..c15dff7 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDW_XACT_ID, "Foreign Transactions", fdw_xact_redo, fdw_xact_desc, fdw_xact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 0e932da..b199c88 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index c7b4144..7180bd1 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -105,6 +105,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE				(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 30610b3..795e85a 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -227,6 +227,7 @@ typedef struct xl_parameter_change
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6..3d5333a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -178,6 +178,7 @@ typedef struct ControlFileData
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8e4145f..21e5bcc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5199,6 +5199,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_fdwxact_resolver', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,oid,timestamptz}',
+  proargmodes => '{o,o,o,o}',
+  proargnames => '{pid,dbid,n_entries,last_resolved_time}',
+  prosrc => 'pg_stat_get_fdwxact_resolver' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5910,6 +5917,22 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_prepared_fdw_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{dbid,transaction,serverid,userid,status,identifier}',
+  prosrc => 'pg_prepared_fdw_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction',
+  proname => 'pg_remove_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_remove_fdw_xact' },
+{ oid => '6052', descr => 'resolve foreign transaction',
+  proname => 'pg_resolve_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_resolve_fdw_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb54..f76e83d 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "foreign/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
 
@@ -168,6 +169,12 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef bool (*PrepareForeignTransaction_function) (ForeignTransaction *foreign_xact);
+typedef bool (*CommitForeignTransaction_function) (ForeignTransaction *foreign_xact);
+typedef bool (*RollbackForeignTransaction_function) (ForeignTransaction *foreing_xact);
+typedef bool (*ResolveForeignTransaction_function) (ForeignTransaction *foreign_xact,
+													bool is_commit);
+typedef bool (*IsTwoPhaseCommitEnabled_function) (Oid serverid);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -235,6 +242,13 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for distributed transactions */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	ResolveForeignTransaction_function ResolveForeignTransaction;
+	IsTwoPhaseCommitEnabled_function IsTwoPhaseCommitEnabled;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -247,7 +261,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
@@ -258,4 +271,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in foreign/fdwxact.c */
+extern void FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id);
+
 #endif							/* FDWAPI_H */
diff --git a/src/include/foreign/fdwxact.h b/src/include/foreign/fdwxact.h
new file mode 100644
index 0000000..5138a2c
--- /dev/null
+++ b/src/include/foreign/fdwxact.h
@@ -0,0 +1,147 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL distributed transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact.h
+ */
+#ifndef FDW_XACT_H
+#define FDW_XACT_H
+
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "foreign/fdwxact_xlog.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+#define	FDW_XACT_NOT_WAITING		0
+#define	FDW_XACT_WAITING			1
+#define	FDW_XACT_WAIT_COMPLETE		2
+
+#define FdwXactEnabled() (max_prepared_foreign_xacts > 0)
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDW_XACT_ID_MAX_LEN 200
+
+/* Enum to track the status of prepared foreign transaction */
+typedef enum
+{
+	FDW_XACT_INITIAL,
+	FDW_XACT_PREPARING,					/* foreign transaction is being prepared */
+	FDW_XACT_PREPARED,					/* foriegn transaction is prepared */
+	FDW_XACT_COMMITTING_PREPARED,		/* foreign prepared transaction is to
+										 * be committed */
+	FDW_XACT_ABORTING_PREPARED, /* foreign prepared transaction is to be
+								 * aborted */
+} FdwXactStatus;
+
+/* Shared memory entry for a prepared or being prepared foreign transaction */
+typedef struct FdwXactData *FdwXact;
+
+typedef struct FdwXactData
+{
+	FdwXact		fxact_free_next;	/* Next free FdwXact entry */
+	FdwXact		fxact_next;		/* Pointer to the neext FdwXact entry accosiated
+								 * with the same transaction */
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	FdwXactStatus status;		/* The state of the foreign
+								 * transaction. This doubles as the
+								 * action to be taken on this entry. */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid; /* Has the entry been complete and written to file? */
+	BackendId	registered_backend;	/* Backend who registered this entry */
+	bool		ondisk;			/* TRUE if prepare state file is on disk */
+	bool		inredo;			/* TRUE if entry was added via xlog_redo */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/* Shared memory layout for maintaining foreign prepared transaction entries. */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		freeFdwXacts;
+
+	/* Number of valid foreign transaction entries */
+	int			numFdwXacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdw_xacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* Struct for foreign transaction resolution */
+typedef struct FdwXactResolveState
+{
+	Oid				dbid;		/* database oid */
+	TransactionId	wait_xid;	/* local transaction id waiting to be resolved */
+	PGPROC			*waiter;	/* backend process waiter */
+	FdwXact			fdwxact;	/* foreign transaction entries to resolve */
+} FdwXactResolveState;
+
+/* Struct for foreign transaction passed to API */
+typedef struct ForeignTransaction
+{
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	char			*fx_id;
+} ForeignTransaction;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern bool foreign_twophase_commit;
+
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdw_xact_exists(TransactionId xid, Oid dboid, Oid serverid,
+				Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactResolveDistributedTransaction(FdwXactResolveState *fstate);
+extern void FdwXactResolveAllDanglingTransactions(Oid dbid);
+extern bool ForeignTwophaseCommitRequired(void);
+extern FdwXactResolveState *CreateFdwXactResolveState(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo,
+												  int flags);
+extern bool check_foreign_twophase_commit(bool *newval, void **extra,
+										  GucSource source);
+
+#endif   /* FDW_XACT_H */
diff --git a/src/include/foreign/fdwxact_launcher.h b/src/include/foreign/fdwxact_launcher.h
new file mode 100644
index 0000000..6ed003b
--- /dev/null
+++ b/src/include/foreign/fdwxact_launcher.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _FDWXACT_LAUNCHER_H
+#define _FDWXACT_LAUNCHER_H
+
+#include "foreign/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherWakeup(void);
+
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+
+extern bool IsFdwXactLauncher(void);
+
+extern void fdwxact_maybe_launch_resolver(bool ignore_error);
+
+
+#endif	/* _FDWXACT_LAUNCHER_H */
diff --git a/src/include/foreign/fdwxact_resolver.h b/src/include/foreign/fdwxact_resolver.h
new file mode 100644
index 0000000..5afd98c
--- /dev/null
+++ b/src/include/foreign/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "foreign/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/foreign/fdwxact_xlog.h b/src/include/foreign/fdwxact_xlog.h
new file mode 100644
index 0000000..f42725e
--- /dev/null
+++ b/src/include/foreign/fdwxact_xlog.h
@@ -0,0 +1,51 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDW_XACT_INSERT	0x00
+#define XLOG_FDW_XACT_REMOVE	0x10
+
+/* Same as GIDSIZE */
+#define FDW_XACT_MAX_ID_LEN 200
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdw_xact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+} xl_fdw_xact_remove;
+
+extern void fdw_xact_redo(XLogReaderState *record);
+extern void fdw_xact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdw_xact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e6..d030368 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -68,10 +68,10 @@ typedef struct ForeignTable
 	List	   *options;		/* ftoptions as DefElem list */
 } ForeignTable;
 
-
 extern ForeignServer *GetForeignServer(Oid serverid);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
 							bool missing_ok);
diff --git a/src/include/foreign/resolver_internal.h b/src/include/foreign/resolver_internal.h
new file mode 100644
index 0000000..9f8676b
--- /dev/null
+++ b/src/include/foreign/resolver_internal.h
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/foreign/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _RESOLVER_INTERNAL_H
+#define _RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/*
+	 * Foreign transaction resolution queue. Protected by FdwXactLock.
+	 */
+	SHM_QUEUE	FdwXactQueue;
+
+	/* Supervisor process */
+	pid_t		launcher_pid;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* _RESOLVER_INTERNAL_H */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d59c24a..f74d1be 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,6 +759,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDW_XACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -832,7 +834,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDW_XACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -912,6 +915,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_READ,
+	WAIT_EVENT_FDW_XACT_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index cb613c8..45880b2 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -153,6 +153,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction
+								 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 75bab29..25d6a2f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDW_XACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..81560bd 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -94,6 +94,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 078129f..31502a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1413,6 +1413,13 @@ pg_policies| SELECT n.nspname AS schemaname,
    FROM ((pg_policy pol
      JOIN pg_class c ON ((c.oid = pol.polrelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+pg_prepared_fdw_xacts| SELECT f.dbid,
+    f.transaction,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.identifier
+   FROM pg_prepared_fdw_xacts() f(dbid, transaction, serverid, userid, status, identifier);
 pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
@@ -1821,6 +1828,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_fdwxact_resolvers| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_fdwxact_resolver() r(pid, dbid, n_entries, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.10.5

#5Chris Travers
chris.travers@gmail.com
In reply to: Masahiko Sawada (#4)

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wanted to start on this while traffic is slow.

I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can be resolved later. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but the people who need it will have the knowledge to work with issues if they come up.

The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes through global transactions sequentially and does not move on to the next until the previous one is resolved. This means that if I have a global transaction on server A, with foreign servers B and C, and I have another one on server A with foreign servers C and D, if server B goes down at the wrong moment, the background worker does not look like it will detect the failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until this is resolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transaction to allow processing to continue and manual resolution (this is good and necessary because in this area there is no ability to have perfect recoverability without occasional administrative action). I would really like to see more documentation of failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viable addition and I think we want it.

It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy to take a first crack at such docs.

To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we would use this feature once available.

------------------

Testing however failed.

make installcheck-world fails with errors like the following:

 -- Modify foreign server and raise an error
  BEGIN;
  INSERT INTO ft7_twophase VALUES(8);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  INSERT INTO ft8_twophase VALUES(NULL); -- violation
! ERROR:  current transaction is aborted, commands ignored until end of transaction block
  ROLLBACK;
  SELECT * FROM ft7_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  -- Rollback foreign transaction that involves both 2PC-capable
  -- and 2PC-non-capable foreign servers.
  BEGIN;
  INSERT INTO ft8_twophase VALUES(7);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  INSERT INTO ft9_not_twophase VALUES(7);
+ ERROR:  current transaction is aborted, commands ignored until end of transaction block
  ROLLBACK;
  SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.

make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a new installation. I think the test cases will have to handle that sort of setup.

make check in the contrib directory passes.

For reasons of test failures, I am setting this back to waiting on author.

------------------
I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that once it is in place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handling part so that extensions that do not use the foreign data wrapper structure could use it as well (while this looks like a classic SQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API.

The new status of this patch is: Waiting on Author

#6Chris Travers
chris.travers@adjust.com
In reply to: Chris Travers (#5)

On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com>
wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

Also one really minor point: I think this is a typo (maX vs max)?

(errmsg("preparing foreign transactions (max_prepared_foreign_transactions

0) requires maX_foreign_xact_resolvers > 0")));

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#7Chris Travers
chris.travers@adjust.com
In reply to: Chris Travers (#6)

On Wed, Oct 3, 2018 at 9:56 AM Chris Travers <chris.travers@adjust.com>
wrote:

On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com>
wrote:

(errmsg("preparing foreign transactions

(max_prepared_foreign_transactions > 0) requires maX_foreign_xact_resolvers

0")));

Two more critical notes here which I think are small blockers.

The error message above references a config variable that does not exist.

The correct name of the config parameter is
max_foreign_transaction_resolvers

Setting that along with the following to 10 caused the tests to pass, but
again it fails on default configs:

max_prepared_foreign_transactions, max_prepared_transactions

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#8Chris Travers
chris.travers@adjust.com
In reply to: Chris Travers (#5)

On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com>
wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

I am hoping I am not out of order in writing this before the commitfest
starts. The patch is big and long and so wanted to start on this while
traffic is slow.

I find this patch quite welcome and very close to a minimum viable
version. The few significant limitations can be resolved later. One thing
I may have missed in the documentation is a discussion of the limits of the
current approach. I think this would be important to document because the
caveats of the current approach are significant, but the people who need it
will have the knowledge to work with issues if they come up.

The major caveat I see in our past discussions and (if I read the patch
correctly) is that the resolver goes through global transactions
sequentially and does not move on to the next until the previous one is
resolved. This means that if I have a global transaction on server A, with
foreign servers B and C, and I have another one on server A with foreign
servers C and D, if server B goes down at the wrong moment, the background
worker does not look like it will detect the failure and move on to try to
resolve the second, so server D will have a badly set vacuum horizon until
this is resolved. Also if I read the patch correctly, it looks like one
can invoke SQL commands to remove the bad transaction to allow processing
to continue and manual resolution (this is good and necessary because in
this area there is no ability to have perfect recoverability without
occasional administrative action). I would really like to see more
documentation of failure cases and appropriate administrative action at
present. Otherwise this is I think a minimum viable addition and I think
we want it.

It is possible i missed that in the documentation. If so, my objection
stands aside. If it is welcome I am happy to take a first crack at such
docs.

After further testing I am pretty sure I misread the patch. It looks like
one can have multiple resolvers which can, in fact, work through a queue
together solving this problem. So the objection above is not valid and I
withdraw that objection. I will re-review the docs in light of the
experience.

To my mind thats the only blocker in the code (but see below). I can say
without a doubt that I would expect we would use this feature once
available.

------------------

Testing however failed.

make installcheck-world fails with errors like the following:

-- Modify foreign server and raise an error
BEGIN;
INSERT INTO ft7_twophase VALUES(8);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft8_twophase VALUES(NULL); -- violation
! ERROR:  current transaction is aborted, commands ignored until end of
transaction block
ROLLBACK;
SELECT * FROM ft7_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
-- Rollback foreign transaction that involves both 2PC-capable
-- and 2PC-non-capable foreign servers.
BEGIN;
INSERT INTO ft8_twophase VALUES(7);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft9_not_twophase VALUES(7);
+ ERROR:  current transaction is aborted, commands ignored until end of
transaction block
ROLLBACK;
SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.

make installcheck in the contrib directory shows the same, so that's the
easiest way of reproducing, at least on a new installation. I think the
test cases will have to handle that sort of setup.

make check in the contrib directory passes.

For reasons of test failures, I am setting this back to waiting on author.

------------------
I had a few other thoughts that I figure are worth sharing with the
community on this patch with the idea that once it is in place, this may
open up more options for collaboration in the area of federated and
distributed storage generally. I could imagine other foreign data wrappers
using this API, and folks might want to refactor out the atomic handling
part so that extensions that do not use the foreign data wrapper structure
could use it as well (while this looks like a classic SQL/MED issue, I am
not sure that only foreign data wrappers would be interested in the API.

The new status of this patch is: Waiting on Author

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#9Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Chris Travers (#8)

On Wed, Oct 3, 2018 at 6:02 PM Chris Travers <chris.travers@adjust.com> wrote:

On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wanted to start on this while traffic is slow.

I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can be resolved later. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but the people who need it will have the knowledge to work with issues if they come up.

The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes through global transactions sequentially and does not move on to the next until the previous one is resolved. This means that if I have a global transaction on server A, with foreign servers B and C, and I have another one on server A with foreign servers C and D, if server B goes down at the wrong moment, the background worker does not look like it will detect the failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until this is resolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transaction to allow processing to continue and manual resolution (this is good and necessary because in this area there is no ability to have perfect recoverability without occasional administrative action). I would really like to see more documentation of failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viable addition and I think we want it.

It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy to take a first crack at such docs.

Thank you for reviewing the patch!

After further testing I am pretty sure I misread the patch. It looks like one can have multiple resolvers which can, in fact, work through a queue together solving this problem. So the objection above is not valid and I withdraw that objection. I will re-review the docs in light of the experience.

Actually the patch doesn't solve this problem; the foreign transaction
resolver processes distributed transactions sequentially. But since
one resolver process is responsible for one database the backend
connecting to another database can complete the distributed
transaction. I understood the your concern and agreed to solve this
problem. I'll address it in the next patch.

To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we would use this feature once available.

------------------

Testing however failed.

make installcheck-world fails with errors like the following:

-- Modify foreign server and raise an error
BEGIN;
INSERT INTO ft7_twophase VALUES(8);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft8_twophase VALUES(NULL); -- violation
! ERROR:  current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft7_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
-- Rollback foreign transaction that involves both 2PC-capable
-- and 2PC-non-capable foreign servers.
BEGIN;
INSERT INTO ft8_twophase VALUES(7);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft9_not_twophase VALUES(7);
+ ERROR:  current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.

make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a new installation. I think the test cases will have to handle that sort of setup.

The 'make installcheck' is a regression test mode to do the tests to
the existing installation. If the installation disables atomic commit
feature (e.g. max_prepared_foreign_transaction etc) the test will fail
because the feature is disabled by default.

make check in the contrib directory passes.

For reasons of test failures, I am setting this back to waiting on author.

------------------
I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that once it is in place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handling part so that extensions that do not use the foreign data wrapper structure could use it as well (while this looks like a classic SQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API.

The new status of this patch is: Waiting on Author

Also, I'll update the doc in the next patch that I'll post on this week.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#9)
4 attachment(s)

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 3, 2018 at 6:02 PM Chris Travers <chris.travers@adjust.com> wrote:

On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wanted to start on this while traffic is slow.

I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can be resolved later. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but the people who need it will have the knowledge to work with issues if they come up.

The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes through global transactions sequentially and does not move on to the next until the previous one is resolved. This means that if I have a global transaction on server A, with foreign servers B and C, and I have another one on server A with foreign servers C and D, if server B goes down at the wrong moment, the background worker does not look like it will detect the failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until this is resolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transaction to allow processing to continue and manual resolution (this is good and necessary because in this area there is no ability to have perfect recoverability without occasional administrative action). I would really like to see more documentation of failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viable addition and I think we want it.

It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy to take a first crack at such docs.

Thank you for reviewing the patch!

After further testing I am pretty sure I misread the patch. It looks like one can have multiple resolvers which can, in fact, work through a queue together solving this problem. So the objection above is not valid and I withdraw that objection. I will re-review the docs in light of the experience.

Actually the patch doesn't solve this problem; the foreign transaction
resolver processes distributed transactions sequentially. But since
one resolver process is responsible for one database the backend
connecting to another database can complete the distributed
transaction. I understood the your concern and agreed to solve this
problem. I'll address it in the next patch.

To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we would use this feature once available.

------------------

Testing however failed.

make installcheck-world fails with errors like the following:

-- Modify foreign server and raise an error
BEGIN;
INSERT INTO ft7_twophase VALUES(8);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft8_twophase VALUES(NULL); -- violation
! ERROR:  current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft7_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
-- Rollback foreign transaction that involves both 2PC-capable
-- and 2PC-non-capable foreign servers.
BEGIN;
INSERT INTO ft8_twophase VALUES(7);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft9_not_twophase VALUES(7);
+ ERROR:  current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.

make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a new installation. I think the test cases will have to handle that sort of setup.

The 'make installcheck' is a regression test mode to do the tests to
the existing installation. If the installation disables atomic commit
feature (e.g. max_prepared_foreign_transaction etc) the test will fail
because the feature is disabled by default.

make check in the contrib directory passes.

For reasons of test failures, I am setting this back to waiting on author.

------------------
I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that once it is in place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handling part so that extensions that do not use the foreign data wrapper structure could use it as well (while this looks like a classic SQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API.

The new status of this patch is: Waiting on Author

Also, I'll update the doc in the next patch that I'll post on this week.

Attached the updated version of patches. What I changed from the
previous version are,

* Enabled processing subsequent distributed transactions even when
previous distributed transaction continues to fail due to participants
error.
To implement this, I've splited the waiting queue into two queues: the
active queue and retry queue. All backend inserts itself to the active
queue firstly and change its state to FDW_XACT_WAITING. Once the
resolver process failed to resolve the distributed transaction, it
move the backend entry in the active queue to the retry queue and
change its state to FDW_XACT_WAITING_RETRY. The backend entries in the
active queue are processed each commit time whereas entries in the
retry queue are processed at interval of
foreign_transaction_resolution_retry_interval.

* Updated docs, added the new section "Distributed Transaction" at
Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact directory.

* Some bug fixes.

Please reivew them.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v19-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/x-patch; name=v19-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 555fec86f082a092725fbae1c85a4e00d70f8539 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 8 Feb 2018 11:26:46 +0900
Subject: [PATCH v19 1/4] Keep track of writing on non-temporary relation.

---
 src/backend/access/heap/heapam.c | 12 ++++++++++++
 src/include/access/xact.h        |  5 +++++
 2 files changed, 17 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb63471..c2db19b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2629,6 +2629,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleGetOid(tup);
 }
 
@@ -3453,6 +3457,10 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4403,6 +4411,10 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 689c57c..2c1b2d8 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -98,6 +98,11 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
-- 
2.10.5

v19-0003-postgres_fdw-supports-atomic-commit-APIs.patchapplication/x-patch; name=v19-0003-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From edcdf0a8b9f780b9fdb9a0430ee88a007975399a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v19 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/connection.c              | 538 +++++++++++++++++++------
 contrib/postgres_fdw/expected/postgres_fdw.out | 387 +++++++++++++++++-
 contrib/postgres_fdw/option.c                  |   5 +-
 contrib/postgres_fdw/postgres_fdw.c            |  60 ++-
 contrib/postgres_fdw/postgres_fdw.h            |  10 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 151 ++++++-
 doc/src/sgml/postgres-fdw.sgml                 |  37 ++
 7 files changed, 1044 insertions(+), 144 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fe4893a..28f87a6 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -14,9 +14,12 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -56,6 +59,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		am_participant_of_ac;	/* true if fdwxact code control the transaction */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -78,7 +82,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_xact_callback(XactEvent event, void *arg);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
@@ -91,20 +95,14 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
-
-
-/*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
- */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static bool pgfdw_commit_transaction(ConnCacheEntry *entry);
+static bool pgfdw_rollback_transaction(ConnCacheEntry *entry);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
 {
 	bool		found;
 	ConnCacheEntry *entry;
@@ -136,11 +134,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
 	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
+	key = umid;
 
 	/*
 	 * Find or create cached entry for requested connection.
@@ -182,6 +177,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping		*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +186,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->am_participant_of_ac = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +197,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,16 +213,46 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
 /*
  * Connect to remote server using specified server and user mapping properties.
+ * If the attempt to connect fails, and the caller can handle connection failure
+ * (connection_error_ok = true) return NULL, throw error otherwise.
  */
 static PGconn *
 connect_pg_server(ForeignServer *server, UserMapping *user)
@@ -265,11 +301,22 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 
 		conn = PQconnectdbParams(keywords, values, false);
 		if (!conn || PQstatus(conn) != CONNECTION_OK)
+		{
+			char	   *connmessage;
+			int			msglen;
+
+			/* libpq typically appends a newline, strip that */
+			connmessage = pstrdup(PQerrorMessage(conn));
+			msglen = strlen(connmessage);
+			if (msglen > 0 && connmessage[msglen - 1] == '\n')
+				connmessage[msglen - 1] = '\0';
+
 			ereport(ERROR,
 					(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
 					 errmsg("could not connect to server \"%s\"",
 							server->servername),
 					 errdetail_internal("%s", pchomp(PQerrorMessage(conn)))));
+		}
 
 		/*
 		 * Check that non-superuser has used password to establish connection;
@@ -414,15 +461,24 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
+	ForeignServer	*server = GetForeignServer(serverid);
 
 	/* Start main transaction if we haven't yet */
 	if (entry->xact_depth <= 0)
 	{
 		const char *sql;
 
+		/* Register the new foreign server if enabled */
+		if (server_uses_twophase_commit(server))
+		{
+			/* Register foreign server with auto-generated identifer */
+			FdwXactRegisterForeignTransaction(serverid, userid, NULL);
+			entry->am_participant_of_ac = true;
+		}
+
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
@@ -650,12 +706,11 @@ static void
 pgfdw_xact_callback(XactEvent event, void *arg)
 {
 	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
+	ConnCacheEntry	*entry;
 
-	/* Quick exit if no connections were touched in this transaction. */
+	/* Quick exit if no connections were touched in this transaction */
 	if (!xact_got_connection)
 		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote transactions, and
 	 * close them.
@@ -663,17 +718,20 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 	hash_seq_init(&scan, ConnectionHash);
 	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
 	{
-		PGresult   *res;
-
 		/* Ignore cache entry if no open connection right now */
 		if (entry->conn == NULL)
 			continue;
 
+		/*
+		 * Foreign transactions participating to atomic commit are ended
+		 * by two-phase commit APIs. Ignore them.
+		 */
+		if (entry->am_participant_of_ac)
+			continue;
+
 		/* If it has an open remote transaction, try to close it */
 		if (entry->xact_depth > 0)
 		{
-			bool		abort_cleanup_failure = false;
-
 			elog(DEBUG3, "closing remote transaction on connection %p",
 				 entry->conn);
 
@@ -681,40 +739,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 			{
 				case XACT_EVENT_PARALLEL_PRE_COMMIT:
 				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
+					pgfdw_commit_transaction(entry);
 					break;
 				case XACT_EVENT_PRE_PREPARE:
 
@@ -739,66 +764,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 					break;
 				case XACT_EVENT_PARALLEL_ABORT:
 				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
+					pgfdw_rollback_transaction(entry);
 					break;
 			}
 		}
@@ -1193,3 +1159,329 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * The function prepares transaction on foreign server. This function
+ * is called only at the pre-commit phase of the local transaction. Since
+ * we should have the connection to the server that we are interested in
+ * we don't use serverid and userid that are necessary to get user mapping
+ * that is the key of the connection cache.
+ */
+bool
+postgresPrepareForeignTransaction(FdwXactResolveState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+	StringInfo	command;
+
+	entry = hash_search(ConnectionHash, &(state->umid), HASH_FIND, NULL);
+
+	if (!entry->conn)
+		return false;
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	if (result)
+		elog(DEBUG1, "prepared foreign transaction on server %u with ID %s",
+			 state->serverid, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function commits the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresCommitForeignTransaction(FdwXactResolveState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+
+	entry = hash_search(ConnectionHash, &(state->umid),
+						HASH_FIND, NULL);
+
+	result = pgfdw_commit_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function rollbacks the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresRollbackForeignTransaction(FdwXactResolveState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool ret;
+
+	entry = hash_search(ConnectionHash, &(state->umid),
+						HASH_FIND, NULL);
+
+	/* Rollback a remote transaction */
+	ret = pgfdw_rollback_transaction(entry);
+
+	return ret;
+}
+
+bool
+postgresResolveForeignTransaction(FdwXactResolveState *state, bool is_commit)
+{
+	ConnCacheEntry *entry = NULL;
+	StringInfo	command;
+	bool result;
+	PGresult	*res;
+
+	entry = GetConnectionState(state->umid, false, false);
+
+	if (!entry->conn)
+		return false;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 state->fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		/*
+		 * The command failed, raise a warning to log the reason of failure.
+		 * We may not be in a transaction here, so raising error doesn't
+		 * help. Even if we are in a transaction, it would be the resolver
+		 * transaction, which will get aborted on raising error, thus
+		 * delaying resolution of other prepared foreign transactions.
+		 */
+		pgfdw_report_error(LOG, res, entry->conn, false, command->data);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * If we tried to COMMIT/ABORT a prepared transaction and the prepared
+		 * transaction was missing on the foreign server, it was probably
+		 * resolved by some other means. Anyway, it should be considered as resolved.
+		 */
+		result = (sqlstate == ERRCODE_UNDEFINED_OBJECT);
+	}
+	else
+		result = true;
+
+	elog(DEBUG1, "%s prepared foreign transaction on server %u with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 state->serverid,
+		 state->fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->am_participant_of_ac = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+	xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
+
+static bool
+pgfdw_rollback_transaction(ConnCacheEntry *entry)
+{
+	bool abort_cleanup_failure = false;
+
+	/*
+	 * In rollback local transaction, if we don't the connection
+	 * it means any transaction started. So we can regard it as
+	 * success.
+	 */
+	if (!entry || !entry->conn)
+		return true;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is already unsalvageable, do only the cleanup
+	 * and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return true;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+	else
+	{
+		entry->have_prep_stmt = false;
+		entry->have_error = false;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return !abort_cleanup_failure;
+}
+
+static bool
+pgfdw_commit_transaction(ConnCacheEntry *entry)
+{
+	PGresult	*res;
+	bool result = false;
+
+	if (!entry || !entry->conn)
+		return false;
+
+	/*
+	 * If abort cleanup previously failed for this connection,
+	 * we can't issue any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	/*
+	 * If there were any errors in subtransactions, and we
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 21a2ef5..15dadf4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,15 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -185,16 +207,19 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                                      List of foreign tables
- Schema |   Table    |  Server   |                   FDW options                    | Description 
---------+------------+-----------+--------------------------------------------------+-------------
- public | ft1        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft2        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft4        | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
- public | ft5        | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft6        | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft_pg_type | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
-(6 rows)
+                                         List of foreign tables
+ Schema |      Table       |  Server   |                   FDW options                    | Description 
+--------+------------------+-----------+--------------------------------------------------+-------------
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
+(9 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8650,3 +8675,345 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+\det+
+                                                 List of foreign tables
+ Schema |      Table       |  Server   |                            FDW options                            | Description 
+--------+------------------+-----------+-------------------------------------------------------------------+-------------
+ public | fpagg_tab_p1     | loopback  | (table_name 'pagg_tab_p1')                                        | 
+ public | fpagg_tab_p2     | loopback  | (table_name 'pagg_tab_p2')                                        | 
+ public | fpagg_tab_p3     | loopback  | (table_name 'pagg_tab_p3')                                        | 
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')                             | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1', use_remote_estimate 'true') | 
+ public | ft3              | loopback  | (table_name 'loct3', use_remote_estimate 'true')                  | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')                             | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type')                  | 
+ public | ftprt1_p1        | loopback  | (table_name 'fprt1_p1', use_remote_estimate 'true')               | 
+ public | ftprt1_p2        | loopback  | (table_name 'fprt1_p2')                                           | 
+ public | ftprt2_p1        | loopback  | (table_name 'fprt2_p1', use_remote_estimate 'true')               | 
+ public | ftprt2_p2        | loopback  | (table_name 'fprt2_p2', use_remote_estimate 'true')               | 
+ public | rem1             | loopback  | (table_name 'loc1')                                               | 
+ public | rem2             | loopback  | (table_name 'loc2')                                               | 
+(19 rows)
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+  srvname  
+-----------
+ loopback
+ loopback2
+ loopback3
+(3 rows)
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+ERROR:  cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  cannot prepare a transaction that modified remote tables
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 6854f1b..1f45b1c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -108,7 +108,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		 * Validate option value, when we can do so without any context.
 		 */
 		if (strcmp(def->defname, "use_remote_estimate") == 0 ||
-			strcmp(def->defname, "updatable") == 0)
+			strcmp(def->defname, "updatable") == 0 ||
+			strcmp(def->defname, "two_phase_commit") == 0)
 		{
 			/* these accept only boolean values */
 			(void) defGetBoolean(def);
@@ -177,6 +178,8 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* two phase commit support */
+		{"two_phase_commit", ForeignServerRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fd20aa9..1135046 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,8 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -359,6 +361,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
 							 void *extra);
+static bool postgresIsTwoPhaseCommitEnabled(Oid serverid);
 
 /*
  * Helper functions
@@ -452,7 +455,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -506,10 +508,29 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->ResolveForeignTransaction = postgresResolveForeignTransaction;
+	routine->IsTwoPhaseCommitEnabled = postgresIsTwoPhaseCommitEnabled;
+
 	PG_RETURN_POINTER(routine);
 }
 
 /*
+ * postgresIsTwoPhaseCommitEnabled
+ */
+static bool
+postgresIsTwoPhaseCommitEnabled(Oid serverid)
+{
+	ForeignServer	*server = GetForeignServer(serverid);
+
+
+	return server_uses_twophase_commit(server);
+}
+
+/*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
  *
@@ -1356,7 +1377,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2411,7 +2432,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2704,7 +2725,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								&retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3321,7 +3342,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4108,7 +4129,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4198,7 +4219,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4421,7 +4442,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
@@ -5803,3 +5824,26 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+/*
+ * server_uses_twophase_commit
+ * Returns true if the foreign server is configured to support 2PC.
+ */
+bool
+server_uses_twophase_commit(ForeignServer *server)
+{
+	ListCell		*lc;
+
+	/* Check the options for two phase compliance */
+	foreach(lc, server->options)
+	{
+		DefElem    *d = (DefElem *) lfirst(lc);
+
+		if (strcmp(d->defname, "two_phase_commit") == 0)
+		{
+			return defGetBoolean(d);
+		}
+	}
+	/* By default a server is not 2PC compliant */
+	return false;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 70b538e..f01a71a 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
@@ -115,7 +116,8 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
+extern PGconn *GetExistingConnection(Oid umid);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -123,6 +125,11 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern bool postgresPrepareForeignTransaction(FdwXactResolveState *state);
+extern bool postgresCommitForeignTransaction(FdwXactResolveState *state);
+extern bool postgresRollbackForeignTransaction(FdwXactResolveState *state);
+extern bool postgresResolveForeignTransaction(FdwXactResolveState *state,
+											  bool is_commit);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -181,6 +188,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						List *remote_conds, List *pathkeys, bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 88c4cb4..2554c9c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,19 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -2304,7 +2331,6 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a
 
 RESET enable_partitionwise_join;
 
-
 -- ===================================================================
 -- test partitionwise aggregates
 -- ===================================================================
@@ -2354,3 +2380,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+
+\det+
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 54b5e98..f4a9ff5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
-- 
2.10.5

v19-0004-Add-regression-tests-for-atomic-commit.patchapplication/x-patch; name=v19-0004-Add-regression-tests-for-atomic-commit.patchDownload
From 38b3181eb9dd8f2963132017888355da8d3f6a64 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v19 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0..71c8b9d 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..a23f120
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port', two_phase_commit 'on');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port', two_phase_commit 'on');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6890678..d1b181a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2286,9 +2286,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2303,7 +2306,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v19-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/x-patch; name=v19-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From ac0bf1880847507317a6d7b871fae4f28f783048 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH v19 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |   97 +
 doc/src/sgml/config.sgml                      |  124 ++
 doc/src/sgml/distributed-transaction.sgml     |  152 ++
 doc/src/sgml/fdwhandler.sgml                  |  192 ++
 doc/src/sgml/filelist.sgml                    |    1 +
 doc/src/sgml/func.sgml                        |   51 +
 doc/src/sgml/monitoring.sgml                  |   56 +
 doc/src/sgml/postgres.sgml                    |    1 +
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/TAGS               |    1 +
 src/backend/access/fdwxact/fdwxact.c          | 2610 +++++++++++++++++++++++++
 src/backend/access/fdwxact/fdwxact_launcher.c |  641 ++++++
 src/backend/access/fdwxact/fdwxact_resolver.c |  331 ++++
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   65 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   32 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/executor/execPartition.c          |    4 +
 src/backend/executor/nodeForeignscan.c        |    8 +
 src/backend/executor/nodeModifyTable.c        |    5 +
 src/backend/foreign/foreign.c                 |   43 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   61 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  150 ++
 src/include/access/fdwxact_launcher.h         |   32 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   52 +
 src/include/access/resolver_internal.h        |   67 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/foreign/fdwapi.h                  |   18 +-
 src/include/foreign/foreign.h                 |    2 +-
 src/include/pgstat.h                          |    8 +-
 src/include/storage/proc.h                    |   10 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |   12 +
 62 files changed, 5147 insertions(+), 27 deletions(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 120000 src/backend/access/fdwxact/TAGS
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_launcher.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6d6fbec..9d99cdc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9622,6 +9622,103 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-prepared-fdw-xacts">
+  <title><structname>pg_prepared_fdw_xacts</structname></title>
+
+  <indexterm zone="view-pg-prepared-fdw-xacts">
+   <primary>pg_prepared_fdw_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_prepared_fdw_xacts</structname> displays
+   information about foreign transactions that are currently prepared on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="fdw-transaction-managements"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_prepared_xacts</structname> contains one row per prepared
+   foreign transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_prepared_fdw_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>transaction</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Transaction id that this foreign transaction associates with
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server that this foreign server is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction: <literal>prepared</literal>, <literal>committing</literal>, <literal>aborting</literal> or <literal>unknown</literal>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7554cba..589ef6e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1547,6 +1547,29 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+      <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the maximum number of foreign transactions that can be prepared
+        simultaneously. A single local transaction can give rise to multiple
+        foreign transaction. If <literal>N</literal> local transactions each
+        across <literal>K</literal> foreign server this value need to be set
+        <literal>N * K</literal>, not just <literal>N</literal>.
+        This parameter can only be set at server start.
+       </para>
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-work-mem" xreflabel="work_mem">
       <term><varname>work_mem</varname> (<type>integer</type>)
       <indexterm>
@@ -3612,6 +3635,78 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
      </variablelist>
     </sect2>
 
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+
+     <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+      <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+      <indexterm>
+       <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+        resolver is responsible for foreign transaction resolution on one database.
+       </para>
+       <para>
+        Foreign transaction resolution workers are taken from the pool defined by
+        <varname>max_worker_processes</varname>.
+       </para>
+       <para>
+        The default value is 0.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+      <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specify how long the foreign transaction resolver should wait when the last resolution
+        fails before retrying to resolve foreign transaction. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 10 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+      <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Terminate foreign transaction resolver processes that don't have any foreign
+        transactions to resolve longer than the specified number of milliseconds.
+        A value of zero disables the timeout mechanism.  You should set this value to
+        zero only if you set <varname>max_foreign_transaction_resolvers</varname> as
+        much as databases you have. This parameter can only be set in the
+        <filename>postgresql.conf</filename> file or on the server command line.
+       </para>
+       <para>
+        The default value is 60 seconds.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     </variablelist>
+    </sect2>
+
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -7827,6 +7922,35 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-foreign-transaction">
+    <title>Foreign Transaction Management</title>
+
+    <variablelist>
+
+     <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophase_commit">
+      <term><varname>foreign_twophase_commit</varname> (<type>bool</type>)
+       <indexterm>
+        <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+       </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether transaction commit will wait for all involving foreign transaction
+        to be resolved before the command returns a "success" indication to the client.
+        Both <varname>max_prepared_foreign_transactions</varname> and
+        <varname>max_foreign_transaction_resolvers</varname> must be non-zero value to
+        allow foreign twophase commit to be used.
+       </para>
+       <para>
+        This parameter can be changed at any time; the behavior for any one transaction
+        is determined by the setting in effect when it commits.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..54e582e
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,152 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction Management</title>
+
+ <para>
+  This chapter explains what distributed transaction management is, and how it can be configured
+  in PostgreSQL.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit is an operation that applies a set of changes as a single operation
+   globally. <productname>PostgreSQL</productname> provides a way to perform a transaction
+   with foreign resources using <literal>Foreign Data Wrapper</literal>. Using the
+   <productname>PostgreSQL</productname>'s atomic commit ensures that all changes
+   on foreign servers end in either commit or rollback using the transaction callback
+   routines (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employees Two-phase commit protocol, which is a
+    type of atomic commitment protocol (ACP). Using Two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed transaction manager
+    prepares all transaction on the foreign servers if two-phase commit is required.
+    Two-phase commit is required only if the transaction modifies data on two or more
+    servers including the local server itself and user requests it by
+    <xref linkend="guc-foreign-twophase-commit"/>. If all preparations on foreign servers
+    got successful go to the next step. Any faliure happens in this step
+    <productname>PostgreSQL</productname> changes over rollback, then rollback all transactions
+    on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the transaction
+    locally. Any failure happens in this step <productname>PostgreSQL</productname> changes
+    over rollback, then rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign Transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Foreign Transaction Resolution</title>
+
+   <para>
+    Foreign transaction resolutions are performed by foreign transaction resolver process.
+    They commit all prepared transaction on foreign servers if the coordinator received
+    an agreement message from all foreign server during the first step. On the other hand,
+    if any foreign server failed to prepare the transaction, it rollbacks all prepared
+    transactions.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions on one
+    database of the coordinator side. On failure during resolution, they retries to
+    resolve after <varname>foreign_transaction_resolution_interval</varname>.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>In-doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit or rollback
+    using two-phase commit protocol. However, if the second phase fails for whatever reason
+    the transaction becomes in-doubt. The transactions becomes in-doubt in the following
+    situations:
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server crashes during atomic commit
+      operation.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server got a cancellation by user during
+      atomic commit.
+     </para>
+    </listitem>
+   </itemizedlist>
+
+   In-doubt transactions are automatically handled by foreign transaction resolver process
+   when there is no online transaction requesting resolutions.
+   <function>pg_resolve_fdw_xact</function> provides a way to resolve transactions on foreign
+   servers that are participate the distributed transaction manually.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-fdwxact-resolver-view"><literal>pg_stat_fdwxact_resolver</literal></link>
+    view. This view contains one row for every foreign Transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd..5180ce0 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1390,6 +1390,103 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>atomic commit</firstterm>
+     (as described in <xref linkend="fdw-transaction-managements"/>), it must call the
+     registrasaction function <function>FdwXactRegisterForeignTransaction</function>
+     and provide the following callback functions:
+    </para>
+
+    <para>
+<programlisting>
+bool
+PrepareForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if atomic commit is required.
+    Returning <literal>true</literal> means that preparing the foreign
+    transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Commit the not-prepared transaction on the foreign server.
+    This function is called at the pre-commit phase of local
+    transaction if atomic commit is not required. The atomic
+    commit is not required either when we modified data on
+    only one server including the local server or when userdoesn't
+    request atomic commit by <xref linkend="guc-foreign-twophase-commit"/>.
+    Returning <literal>true</literal> means that commit the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Rollback a not-prepared transaction on the foreign server.
+    This function is called at the end of local transaction after
+    rollbacked locally either when user requested rollback or when
+    any error occurs during the transaction. This function could
+    be called recursively if any error occurs during rollback the
+    foreign transaction for whatever reason. You need to track
+    recursion and prevent this function from being called infinitely.
+    Returning <literal>true</literal> means that rollback the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+ResolvePreparedForeignTransaction(FdwXactResolveState *state,
+                                  bool is_commit);
+</programlisting>
+    Commit or rollback the prepared transaction on the foreign server.
+    When <varname>is_commit</varname> is true, it indicates that the foreign
+    transaction should be committed. Otherwise the foreign transaction should
+    be aborted.
+    This function normally is called by the foreign transaction resolver
+    process but can also be called by <function>pg_resovle_fdw_xacts</function>
+    function. In the resolver process, this function is called either
+    when a backend requests the resolver process to resolve a distributed
+    transaction after prepared, or when a database has dangling
+    transactions. Returning <literal>true</literal> means that resolving
+    the foreign transaction got successful.
+    In abort case, please note that the prepared transaction identified
+    by <varname>state->fdwxact_id</varname> might not exist on the foreign
+    server. If you failed to resolve the foreign transaction due to undefined
+    object error (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) you should
+   regards it as success and return <literal>true</literal>.
+    </para>
+    <para>
+<programlisting>
+bool
+IsTwoPhaseCommitEnabled(Oid serverid);
+</programlisting>
+    Return <literal>true</literal> if the foreign server identified by
+    <literal>serverid</literal> is capable of two-phase commit protocol.
+    This function is called when the transaction begins to modify data on
+    the foreign server. Return <literal>false</literal> indicates that
+    the current transaction cannot use atomic commit even if atomic commit
+    is requested by user.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1835,4 +1932,99 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+    <title>Transaction managements for Foreign Data Wrappers</title>
+
+    <para>
+     <productname>PostgreSQL</productname> foreign transaction manager
+     allows FDWs to read and write data on foreign server within a transaction while
+     maintaining atomicity of the foreign data (aka atomic commit). Using
+     atomic commit, it guarantees that a distributed transaction is committed
+     or rollbacked on all participants foreign
+     server.  To achieve atomic commit, <productname>PostgreSQL</productname>
+     employees two-phase commit protocol, which is a type of atomic commitment
+     protocol. Every FDW that wish to support atomic commit
+     is required to support the transaction management callback routines
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details)
+     and register the foreign transaction using
+     <function>FdwXactRegisterForeignTransaction</function> when starting a
+     transaction on the foreign server. Transaction of registered foreign server
+     is managed by the foreign transaction manager.
+<programlisting>
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fx_id)
+</programlisting>
+    This function should be called when a transaction starts on the foreign server.
+    <varname>serverid</varname> and <varname>userid</varname> are <type>OID</type>s
+    which specify the transaction starts on what server by who. <varname>fx_id</varname>
+    is null-terminated string which is an identifer of foreign transaction and it
+    will be passed when transaction management APIs is called. The length of
+    <varname>fx_id</varname> must be less than 200 bytes. Also this identifier
+    must be unique enough so that it doesn't conflict other concurrent foreign
+    transactions. <varname>fx_id</varname> can be <literal>NULL</literal>.
+    If it's <literal>NULL</literal>, a unique transaction identifier is automacitally
+    generated with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    Since this identifier is used per foreign transaction and the xid of unresolved
+    distributed transaction never reused, an auto-generated identifier is fairly
+    enough to ensure uniqueness. It's recommended to generate foreign transaction
+    identifier in FDW if the format of auto-generated identifier doesn't match
+    the requirement of the foreign server.
+    </para>
+
+    <para>
+     An example of such transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When a transaction starts on the foreign server, FDW that wishes atomic
+     commit must register the foreign transaction as a participant by calling
+     <function>FdwXactRegisterForeignTransaction</function>. Also during
+     transaction, <function>IsTwoPhaseCommitEnabled</function> is called whenever
+     the transaction begins to modify data on the foreign server. If FDW wishes
+     atomic commit <function>IsTwoPhaseCommitEnabled</function> must return
+     <literal>true</literal>. All foreign transaction participants must
+     return <literal>true</literal> to achieve atomic commit.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling <function>PrepareForeignTransaction</function>
+     if two-phase commit protocol is required. Two-phase commit is required only if
+     the transaction modified data on more than one servers including the local
+     server and user requests atomic commit. <productname>PostgreSQL</productname>
+     can commit locally and go to the next step if and only if all preparing foreign
+     transactions got successful. If two-phase commit is not required, the foreign
+     transaction manager commits the transaction on the foreign server by calling
+     <function>CommitForeignTransaction</function> and then
+     <productname>PostgreSQL</productname> commits locally. The foreign transaction
+     manager doesn't do any further change on foreign transactions from this point
+     forward. If any failure happens during the transaction for whatever reason,
+     for example a network failure or user request until
+     <productname>PostgreSQL</productname> commits locally the foreign transaction
+     manager changes over to rollback and calls
+     <function>RollbackForeignTransaction</function> for every foreign servers to
+     close the current transaction on foreign servers.
+    </para>
+
+    <para>
+     When two-phase commit is required, after committed locally, the transaction
+     commits will wait for all prepared foreign transaction to be resolved before
+     the commit completes. One foreign transaction resolver is responsible for
+     foreign transaction resolution on a database.
+     <function>ResolverForeignTransaction</function> is called by the foreign
+     transaction resolver process when resolution.
+     <function>ResolveForeignTransaction</function> is also be called
+     when user executes <function>pg_resovle_fdw_xact</function> function.
+    </para>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a..38d6fcb 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5193df3..8bb251e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20806,6 +20806,57 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-fdw-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_fdw_xacts</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_fdw_xacts</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function search for foreign transaction
+        matching the arguments and resolves then. This function won't resolve
+        a foreign transaction which is in progress, or one that is locked by some
+        other backend.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_fdw_xact</function>
+        except it remove foreign transaction entry without resolving.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa..6b2aa6f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_fdw_xact_resolver</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-fdwxact-resolver-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1194,6 +1202,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalLauncherMain</literal></entry>
          <entry>Waiting in main loop of logical launcher process.</entry>
         </row>
@@ -1405,6 +1421,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2214,6 +2234,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-fdwxact-resolver-view" xreflabel="pg_stat_fdw_xact_resolver">
+   <title><structname>pg_stat_fdw_xact_resolver</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603..c10e21f 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -164,6 +164,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index bd93a6a..4a1ebdc 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  tablesample transam
+			  tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..9ddbb14
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o fdwxact_resolver.o fdwxact_launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/TAGS b/src/backend/access/fdwxact/TAGS
new file mode 120000
index 0000000..1a96393
--- /dev/null
+++ b/src/backend/access/fdwxact/TAGS
@@ -0,0 +1 @@
+/home/masahiko/source/postgresql/TAGS
\ No newline at end of file
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..f57ad8e
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2610 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL distributed transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * When a foreign data wrapper starts transaction on a foreign server that
+ * is capable of two-phase commit protocol, foreign data wrappers registers
+ * the foreign transaction using function FdwXactRegisterForeignTransaction()
+ * in order to participate to a group for atomic commit. Participants are
+ * identified by oid of foreign server and user. When the foreign transaction
+ * begins to modify data the executor marks it as modified using
+ * FdwXactMarkForeignTransactionModified().
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. After committing or rolling back locally, we
+ * notify the resolver process and tell it to commit or roll back those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail when an ERROR
+ * after already committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * waiters each time we receive a request. We have two queues: the active
+ * queue and the retry queue. The backend is inserted to the active queue at
+ * first, and then it is moved to the retry queue by the resolver process if
+ * the resolution fails. The backends in the retry queue are processed at
+ * interval of foreign_transaction_resolution_retry_interval.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or more
+ * servers including itself. In other case, all foreign transactions are
+ * committed during pre-commit.
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. dangling
+ * transaction). Dangling transactions are processed by the resolve process
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * 	* On PREPARE redo we add the foreign transaction to FdwXactCtl->fdw_xacts.
+ *	  We set fdw_xact->inredo to true for such entries.
+ *	* On Checkpoint redo, we iterate through FdwXactCtl->fdw_xacts entries that
+ *	  have set fdw_xact->inredo true and are behind the redo_horizon. We save
+ *    them to disk and then set fdw_xact->ondisk to true.
+ *	* On COMMIT and ABORT we delete the entry from FdwXactCtl->fdw_xacts.
+ *	  If fdw_xact->ondisk is true, we delete the corresponding file from
+ *	  the disk as well.
+ *  * RecoverFdwXacts loads all foreign transaction entries from disk into
+ *    memory at server startup.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Is atomic commit requested by user? */
+#define AtomicCommitRequested() \
+	(foreign_twophase_commit == true && \
+	 max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Structure to bundle the foreign transaction participant */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in global entry. NULL if
+	 * this foreign transaction is registered but not inserted
+	 * yet.
+	 */
+	FdwXact		fdw_xact;
+	char		*fdw_xact_id;
+
+	/* Participant server and its user mapping */
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	/* true if this transaction modified data on the foreign server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact;
+	CommitForeignTransaction_function	commit_foreign_xact;
+	RollbackForeignTransaction_function	rollback_foreign_xact;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit.
+ * This list has only foreign servers that are capable of two-phase
+ * commit protocol.
+ *
+ * We can manage all foreign transactions involving with the transaction
+ * regardless of their configuration but we didn't because it requires all
+ * FDWs to register foreign server anyway, which breaks the backward
+ * compatibility.
+ */
+List *FdwXactParticipantsForAC = NIL;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDW_XACTS_DIR "pg_fdw_xact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDW_XACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDW_XACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+static FdwXact FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static bool FdwXactResolveForeignTransaction(FdwXactResolveState *state, FdwXact fdwxact,
+											 int elevel);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactQueueInsert(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid, bool give_warnings);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool give_warnings);
+static List *get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						   bool need_lock);
+static FdwXact get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								bool need_lock);
+static FdwXact get_all_fdw_xacts(int *length);
+static FdwXact insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							   Oid umid, char *fdw_xact_id);
+static char *generate_fdw_xact_identifier(Oid serverid, Oid userid);
+static void remove_fdw_xact(FdwXact fdw_xact);
+static FdwXactResolveState *create_fdw_xact_resovle_state(void);
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+bool foreign_twophase_commit = false;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * This function aimes to be called by FDW when foreign transaction
+ * starts. The foreign server identified by given server id must
+ * support atomic commit APIs. The foreign transaction is identified
+ * by given identifier 'fdwxact_id' which can be NULL. If it's NULL,
+ * we construct an unique identifer instead.
+ *
+ * Registered foreign transaction are managed by foreign transaction
+ * manager until the end of the transaction.
+ */
+void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fdwxact_id)
+{
+	FdwXactParticipant	*fdw_part;
+	ListCell   			*lc;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	MemoryContext		old_ctx;
+
+	/* Check length of foreign transaction identifier */
+	if (fdwxact_id != NULL && strlen(fdwxact_id) >= NAMEDATALEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						fdwxact_id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   NAMEDATALEN)));
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	/* Duplication check */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		/* Quick return if there is already registered connection */
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+			ereport(ERROR,
+					(errmsg("attempt to start transction again on server %u user %u",
+							serverid, userid)));
+	}
+
+	/*
+	 * Participants information is needed at the end of a transaction, when
+	 * system cache are not available. so save it in TopTransactionContext
+	 * beforehand so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	/* Make sure that the FDW has transaction handlers */
+	if (!fdw_routine->PrepareForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function provided for preparing foreign transaction for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to commit a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->RollbackForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to rollback a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+
+	/* Generate unique identifier if not provided */
+	if (fdwxact_id ==  NULL)
+		fdwxact_id = generate_fdw_xact_identifier(serverid, userid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdw_xact_id = fdwxact_id;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdw_xact = NULL;
+	fdw_part->modified = false;	/* by default */
+	fdw_part->prepare_foreign_xact = fdw_routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact = fdw_routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact = fdw_routine->RollbackForeignTransaction;
+
+	/* Add this foreign transaction to the participants list */
+	FdwXactParticipantsForAC = lappend(FdwXactParticipantsForAC, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+
+	return;
+}
+
+/*
+ * Remember the registered foreign transaction modified data. This function
+ * is called by the executor when it begins to modify data on a foreign server
+ * regardless the foreign server is capable of two-phase commit protocol.
+ * Marking it will be used to determine we must use two-phase commit protocol
+ * at commit. This function also checks if the begin modified foreign server
+ * is capable of two-phase commit or not. If it doesn't support, we remember
+ * it in MyXactFlags.
+ */
+void
+FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo, int flags)
+{
+	Relation			rel = resultRelInfo->ri_RelationDesc;
+	FdwXactParticipant	*fdw_part;
+	ForeignTable		*ftable;
+	ListCell   			*lc;
+	Oid					userid;
+	Oid					serverid;
+
+	bool found = false;
+
+	/* Quick return if user not request */
+	if (!AtomicCommitRequested())
+		return;
+
+	/* Do nothing in EXPLAIN (no ANALYZE) case */
+	if (flags && EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	ftable = GetForeignTable(RelationGetRelid(rel));
+
+	/*
+	 * If the being modified foreign server doesn't or cannot enable
+	 * two-phase commit protocol, mark that we've written such server
+	 * and return.
+	 */
+	if (resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled == NULL ||
+		!resultRelInfo->ri_FdwRoutine->IsTwoPhaseCommitEnabled(ftable->serverid))
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		return;
+	}
+
+	/*
+	 * The foreign server being modified supports two-phase commit protocol,
+	 * remember that the foreign transaction modified data.
+	 */
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	serverid = ftable->serverid;
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		fdw_part = lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			fdw_part->modified = true;
+			found = true;
+			break;
+		}
+	}
+
+	if (!found)
+		elog(ERROR, "attempt to mark unregistered foreign server %u, user %u as modified",
+			 serverid, userid);
+}
+
+/*
+ * FdwXactShmemSize
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdw_xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * FdwXactShmemInit
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdw_xacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->freeFdwXacts = NULL;
+		FdwXactCtl->numFdwXacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdw_xacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdw_xacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdw_xacts[cnt].status = FDW_XACT_INITIAL;
+			fdw_xacts[cnt].fxact_free_next = FdwXactCtl->freeFdwXacts;
+			FdwXactCtl->freeFdwXacts = &fdw_xacts[cnt];
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * PreCommit_FdwXacts
+ *
+ * This function prepares all foreign transaction participants if atomic commit
+ * is required. Otherwise commits them without preparing.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * If user requires the atomic commit semantics, we don't allow COMMIT if we've
+	 * modified data on foreign servers both that can execute two-phase commit
+	 * protocol and that cannot.
+	 */
+	if (foreign_twophase_commit == true &&
+		((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	if (ForeignTwophaseCommitRequired())
+	{
+		/*
+		 * Prepare the transactions on the all foreign servers. If any prepare
+		 * transaction fails for whatever reason, we change over aborts.
+		 */
+		FdwXactPrepareForeignTransactions();
+
+		/* keep FdwXactparticipantsForAC until end of transaction */
+	}
+	else
+	{
+		ListCell   *lc;
+
+		/* Two-phase commit is not required, commit them */
+		foreach(lc, FdwXactParticipantsForAC)
+		{
+			FdwXactResolveState *state;
+			FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			state = create_fdw_xact_resovle_state();
+			state->serverid = fdw_part->server->serverid;
+			state->userid = fdw_part->usermapping->userid;
+			state->umid = fdw_part->usermapping->umid;
+
+			/* Commit foreign transaction */
+			if (!fdw_part->commit_foreign_xact(state))
+				ereport(ERROR,
+						(errmsg("could not commit foreign transaction on server %s",
+								fdw_part->server->servername)));
+		}
+
+		/* Forget all participants */
+		FdwXactParticipantsForAC = NIL;
+	}
+}
+
+/*
+ * FdwXactPrepareForeignTransactions
+ *
+ * Prepare all foreign transaction participants.  This function creates a prepared
+ * participants chain each time when we prepared a foreign transaction. The prepared
+ * participants chain is used to access all participants of distributed transaction
+ * quickly. If any one of them fails to prepare, we change over aborts.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	FdwXactResolveState *state;
+	ListCell   *lcell;
+	FdwXact		prev_fdwxact = NULL;
+
+	state = create_fdw_xact_resovle_state();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXact		fdwxact;
+
+		/*
+		 * Register the foreign transaction entry. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactRegisterFdwXactEntry(GetTopTransactionId(), fdw_part);
+
+		state->serverid = fdw_part->server->serverid;
+		state->userid = fdw_part->usermapping->userid;
+		state->umid = fdw_part->usermapping->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		/*
+		 * Between FdwXactRegisterFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal). During abort processing,
+		 * we might try to resolve a never-prepared transaction, and get an error.
+		 * This is fine as long as the FDW provides us unique prepared transaction
+		 * identifiers.
+		 */
+		if (!fdw_part->prepare_foreign_xact(state))
+		{
+			/* Failed to prepare, change over aborts */
+			ereport(ERROR,
+					(errmsg("could not prepare transaction on foreign server %s",
+							fdw_part->server->servername)));
+		}
+
+		/* Preparation is success, update its status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdw_part->fdw_xact->status = FDW_XACT_PREPARED;
+		fdw_part->fdw_xact = fdwxact;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Create a prepared participants chain, which is link-ed FdwXact entries
+		 * involving with this transaction.
+		 */
+		if (prev_fdwxact)
+		{
+			/* Append others to the tail */
+			Assert(fdwxact->fxact_next == NULL);
+			prev_fdwxact->fxact_next = fdwxact;
+		}
+	}
+}
+
+/*
+ * FdwXactRegisterFdwXactEntry
+ *
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and will
+ * be persisted to the disk under pg_fdw_xact directory when checkpoint.
+ */
+static FdwXact
+FdwXactRegisterFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fxact;
+	FdwXactOnDiskData	*fxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact = insert_fdw_xact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdw_xact_id);
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->held_by = MyBackendId;
+	fdw_part->fdw_xact = fxact;
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdw_xact_id);
+	data_len = data_len + strlen(fdw_part->fdw_xact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fxact_file_data->dbid = MyDatabaseId;
+	fxact_file_data->local_xid = xid;
+	fxact_file_data->serverid = fdw_part->server->serverid;
+	fxact_file_data->userid = fdw_part->usermapping->userid;
+	fxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fxact_file_data->fdw_xact_id, fdw_part->fdw_xact_id,
+		   strlen(fdw_part->fdw_xact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fxact_file_data, data_len);
+	fxact->insert_end_lsn = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_INSERT);
+	XLogFlush(fxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact->valid = true;
+	LWLockRelease(FdwXactLock);
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fxact_file_data);
+	return fxact;
+}
+
+/*
+ * insert_fdw_xact
+ *
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdw_xact_id)
+{
+	int i;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		fxact = FdwXactCtl->fdw_xacts[i];
+		if (fxact->dbid == dbid &&
+			fxact->local_xid == xid &&
+			fxact->serverid == serverid &&
+			fxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->freeFdwXacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fxact = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fxact->fxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->numFdwXacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts++] = fxact;
+
+	fxact->held_by = InvalidBackendId;
+	fxact->dbid = dbid;
+	fxact->local_xid = xid;
+	fxact->serverid = serverid;
+	fxact->userid = userid;
+	fxact->umid = umid;
+	fxact->insert_start_lsn = InvalidXLogRecPtr;
+	fxact->insert_end_lsn = InvalidXLogRecPtr;
+	fxact->valid = false;
+	fxact->ondisk = false;
+	fxact->inredo = false;
+	memcpy(fxact->fdw_xact_id, fdw_xact_id, strlen(fdw_xact_id) + 1);
+
+	return fxact;
+}
+
+/*
+ * remove_fdw_xact
+ *
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdw_xact(FdwXact fdw_xact)
+{
+	int			cnt;
+
+	Assert(fdw_xact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		if (FdwXactCtl->fdw_xacts[cnt] == fdw_xact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (cnt >= FdwXactCtl->numFdwXacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdw_xact->local_xid, fdw_xact->serverid, fdw_xact->userid)));
+
+	/* Remove the entry from active array */
+	FdwXactCtl->numFdwXacts--;
+	FdwXactCtl->fdw_xacts[cnt] = FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts];
+
+	/* Put it back into free list */
+	fdw_xact->fxact_free_next = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fdw_xact;
+
+	/* Reset informations */
+	fdw_xact->status = FDW_XACT_INITIAL;
+	fdw_xact->held_by = InvalidBackendId;
+	fdw_xact->fxact_next = NULL;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdw_xact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdw_xact->serverid;
+		record.dbid = fdw_xact->dbid;
+		record.xid = fdw_xact->local_xid;
+		record.userid = fdw_xact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdw_xact_remove));
+		recptr = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true if the current transaction requires foreign two-phase commit
+ * to achieve atomic commit. Foreign two-phase commit is required if we
+ * satisfy either case: we modified data on two or more foreign server, or
+ * we modified both non-temporary relation on local and data on more than
+ * one foreign server.
+ */
+bool
+ForeignTwophaseCommitRequired(void)
+{
+	int	nserverswritten = list_length(FdwXactParticipantsForAC);
+	ListCell*	lc;
+	bool		modified = false;
+
+	/* Return if not requested */
+	if (!AtomicCommitRequested())
+		return false;
+
+	/* Check if we modified data on any foreign server */
+	foreach(lc, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+		{
+			modified = true;
+			break;
+		}
+	}
+
+	/* We didn't modify data on any foreign server */
+	if (!modified)
+		return false;
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	return nserverswritten > 1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * ForgetAllFdwXactParticipants
+ *
+ * Reset all the foreign transaction entries that this backend registered.
+ * If the foreign transaction has the corresponding FdwXact entry, resetting
+ * the held_by field means to leave that entry in unresolved state. If we
+ * leaves any entries, we update the oldest xmin of unresolved transaction
+ * so that transaction status of dangling transaction are not truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(cell, FdwXactParticipantsForAC)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+
+		/* Skip if didn't register FdwXact entry yet */
+		if (fdw_part->fdw_xact == NULL)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in
+		 * FdwXactParticipantsForAC could be used by other backend before we
+		 * forget in case where the resolver process removes the FdwXact entry
+		 * and other backend reuses it before we forget. So we need to check
+		 * if the entries are still associated with the transaction.
+		 */
+		if (fdw_part->fdw_xact->held_by == MyBackendId)
+		{
+			fdw_part->fdw_xact->held_by = InvalidBackendId;
+			n_lefts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Update the oldest local transaction of unresolved distributed
+	 * transaction if we leaved any FdwXact entries.
+	 */
+	if (n_lefts > 0)
+		FdwXactComputeRequiredXmin();
+
+	FdwXactParticipantsForAC = NIL;
+}
+
+/*
+ * AtProcExit_FdwXact
+ *
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDW_XACT_NOT_WAITING and then change
+ * that state to FDW_XACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDW_XACT_WAIT_COMPLETE once foreign transactions are resolved.
+ * This backend then resets its state to FDW_XACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue and changes the state to FDW_XACT_WAITING_RETRY.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+	ListCell	*lc;
+	List		*fdwxact_participants = NIL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!AtomicCommitRequested())
+		return;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDW_XACT_NOT_WAITING);
+
+	if (FdwXactParticipantsForAC != NIL)
+	{
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * we've prepared just before, use the participants list.
+		 */
+		Assert(MyPgXact->xid == wait_xid);
+		fdwxact_participants = FdwXactParticipantsForAC;
+	}
+	else
+	{
+		/*
+		 * Get participants list from the global array. This is required (1)
+		 * when we're waiting for foreign transactions to be resolved that
+		 * is part of a local prepared transaction that is marked as prepared
+		 * during running, or (2) when we resolve the PREPARE'd distributed
+		 * transaction after restart.
+		 */
+		fdwxact_participants = get_fdw_xacts(MyDatabaseId, wait_xid,
+											 InvalidOid, InvalidOid, true);
+	}
+
+	/* Exit if we found no foreign transaction to resolve */
+	if (fdwxact_participants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(lc, fdwxact_participants)
+	{
+		FdwXact fdw_xact = (FdwXact) lfirst(lc);
+
+		/* Don't overwrite status if fate has been determined */
+		if (fdw_xact->status == FDW_XACT_PREPARED)
+			fdw_xact->status = (is_commit ?
+								FDW_XACT_COMMITTING_PREPARED :
+								FDW_XACT_ABORTING_PREPARED);
+	}
+
+	/* Set backend status and enqueue itself to the active queue*/
+	MyProc->fdwXactState = FDW_XACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	FdwXactQueueInsert();
+	LWLockRelease(FdwXactLock);
+
+	/* Launch a resolver process if not yet, or wake it up */
+	fdwxact_maybe_launch_resolver(false);
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDW_XACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDW_XACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDW_XACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+
+	/*
+	 * Forget the list of locked entries, also means that the entries
+	 * that could not resolved are remained as dangling transactions.
+	 */
+	ForgetAllFdwXactParticipants();
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Acquire FdwXactLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Insert MyProc into the tail of FdwXactActiveQueue.
+ */
+static void
+FdwXactQueueInsert(void)
+{
+	SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactActiveQueue),
+						 &(MyProc->fdwXactLinks));
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Resolve one distributed transaction. The target distributed transaction
+ * is fetched from either the active queue or the retry queue and its participants
+ * are fetched from either the global array.
+ *
+ * Release the waiter and return true if we resolved the all of the foreign
+ * transaction participants. On failure, we move the FdwXactLinks entry to the
+ * retry queue from the active queue, and raise an error and exit.
+ */
+bool
+FdwXactResolveDistributedTransaction(Oid dbid, bool is_active)
+{
+	FdwXactResolveState *state;
+	ListCell			*lc;
+	ListCell			*next;
+	PGPROC				*waiter = NULL;
+	List				*participants;
+	SHM_QUEUE			*target_queue;
+
+	if (is_active)
+		target_queue = &(FdwXactRslvCtl->FdwXactActiveQueue);
+	else
+		target_queue = &(FdwXactRslvCtl->FdwXactRetryQueue);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Fetch a waiter from beginning of the queue */
+	while ((waiter = (PGPROC *) SHMQueueNext(target_queue, target_queue,
+											 offsetof(PGPROC, fdwXactLinks))) != NULL)
+	{
+		/* Found a waiter */
+		if (waiter->databaseId == dbid)
+			break;
+	}
+
+	/* If no waiter, there is no job */
+	if (!waiter)
+	{
+		LWLockRelease(FdwXactLock);
+		return false;
+	}
+
+	Assert(TransactionIdIsValid(waiter->fdwXactWaitXid));
+
+	state = create_fdw_xact_resovle_state();
+	state->wait_xid = waiter->fdwXactWaitXid;
+	participants = get_fdw_xacts(dbid, waiter->fdwXactWaitXid,
+								 InvalidOid, InvalidOid, false);
+	LWLockRelease(FdwXactLock);
+
+	/* Resolve all foreign transactions one by one */
+	for (lc = list_head(participants); lc != NULL; lc = next)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(lc);
+
+		CHECK_FOR_INTERRUPTS();
+
+		next = lnext(lc);
+
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		PG_TRY();
+		{
+			FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+		}
+		PG_CATCH();
+		{
+			/* Re-insert the waiter to the retry queue */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDW_XACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactRetryQueue),
+									 &(waiter->fdwXactLinks));
+				waiter->fdwXactState = FDW_XACT_WAITING_RETRY;
+			}
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved a foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+	}
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDW_XACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc xid %u", wait_xid);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	return true;
+}
+
+/*
+ * Resolve all dangling foreign transactions on the given database. Get
+ * all dangling foreign transactions from shmem global array and resolve
+ * them one by one.
+ */
+void
+FdwXactResolveAllDanglingTransactions(Oid dbid)
+{
+	List		*dangling_fdwxacts = NIL;
+	ListCell	*cell;
+	bool		n_resolved = 0;
+	int			i;
+
+	Assert(OidIsValid(dbid));
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/*
+	 * Walk over the global array to make the list of dangling transactions
+	 * of which corresponding local transaction is on the given database.
+	 */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fxact = FdwXactCtl->fdw_xacts[i];
+
+		/*
+		 * Append the fdwxact entry on the given database to the list if
+		 * it's handled by nobody and the corresponding local transaction
+		 * is not part of the prepared transaction.
+		 */
+		if (fxact->dbid == dbid &&
+			fxact->held_by == InvalidBackendId &&
+			!TwoPhaseExists(fxact->local_xid))
+			dangling_fdwxacts = lappend(dangling_fdwxacts, fxact);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/* Return if there is no foreign transaction we need to resolve */
+	if (dangling_fdwxacts == NIL)
+		return;
+
+	foreach(cell, dangling_fdwxacts)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(cell);
+		FdwXactResolveState *state;
+
+		state = create_fdw_xact_resovle_state();
+		state->wait_xid = fdwxact->local_xid;
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+
+		n_resolved++;
+	}
+
+	list_free(dangling_fdwxacts);
+
+	elog(DEBUG2, "resolved %d dangling foreign xacts", n_resolved);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ *
+ * In commit case, we have already prepared transactions on the foreign
+ * servers during pre-commit. And that prepared transactions will be
+ * resolved by the resolver process. So we don't do anything about the
+ * foreign transaction.
+ *
+ * In abort case, user requested rollback or we changed over rollback
+ * due to error during commit. To close current foreign transaction anyway
+ * we call rollback API to every foreign transaction. If we raised an error
+ * during preparing and came to here, it's possible that some entries of
+ * FdwXactParticipants already registered its FdwXact entry. If there is
+ * we leave them as dangling transaction and ask the resolver process to
+ * process them.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		int left_fdwxacts = 0;
+		FdwXactResolveState *state = create_fdw_xact_resovle_state();
+
+		foreach (lcell, FdwXactParticipantsForAC)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * Count FdwXact entries that we registered to shared memory array
+			 * in this transaction.
+			 */
+			if (fdw_part->fdw_xact)
+			{
+				/*
+				 * The status of foreign transaction must be either preparing
+				 * or prepared. In any case, since we have registered FdwXact
+				 * entry we leave them to the resolver process. For the preparing
+				 * state, since the foreign transaction might not close yet we
+				 * fall through and call rollback API. For the prepared state,
+				 * since the foreign transaction has closed we don't need to do
+				 * anything.
+				 */
+				Assert(fdw_part->fdw_xact->status == FDW_XACT_PREPARING ||
+					   fdw_part->fdw_xact->status == FDW_XACT_PREPARED);
+
+				left_fdwxacts++;
+				if (fdw_part->fdw_xact->status == FDW_XACT_PREPARED)
+					continue;
+			}
+
+			state->serverid = fdw_part->server->serverid;
+			state->userid = fdw_part->usermapping->userid;
+			state->umid = fdw_part->usermapping->umid;
+
+			/*
+			 * Rollback all current foreign transaction. Since we're rollbacking
+			 * the transaction it's too late even if we raise an error here.
+			 * So we log it as warning.
+			 */
+			if (!fdw_part->rollback_foreign_xact(state))
+				ereport(WARNING,
+						(errmsg("could not abort transaction on server \"%s\"",
+								fdw_part->server->servername)));
+		}
+
+		/* If we left some FdwXact entries, ask the resolver process */
+		if (left_fdwxacts > 0)
+		{
+			ereport(WARNING,
+					(errmsg("might have left %u foreign transactions in in-doubt status",
+							left_fdwxacts)));
+			fdwxact_maybe_launch_resolver(true);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * AtPrepare_FdwXacts
+ *
+ * If there are foreign servers involved in the transaction, this function
+ * prepares transactions on those servers.
+ *
+ * Note that it can happen that the transaction aborts after we prepared part
+ * of participants. In this case since we can change to abort we cannot forget
+ * FdwXactParticipantsForAC here. These are processed by the resolver process
+ * during aborting, or at EOXact_FdwXacts.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipantsForAC == NIL)
+		return;
+
+	/*
+	 * We cannot prepare distributed transaction if any foreign server of
+	 * participants in the transaction isn't capable of two-phase commit.
+	 */
+	if ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION),
+				 errmsg("can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * FdwXactResolveForeignTransaction
+ *
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * handler routine. The foreign transaction can be a dangling transaction
+ * that is not interested by nobody. If the fate of foreign transaction is
+ * not determined yet, it'sdetermined according to the status of corresponding
+ * local transaction.
+ *
+ * If the resolution is successful, remove the foreign transaction entry from
+ * the shared memory and also remove the corresponding on-disk file.
+ */
+static bool
+FdwXactResolveForeignTransaction(FdwXactResolveState *state, FdwXact fdwxact,
+								 int elevel)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool		is_commit;
+	bool		ret;
+
+	Assert(fdwxact);
+
+	/*
+	 * Determine whether we commit or abort this foreign transaction.
+	 */
+	if (fdwxact->status == FDW_XACT_COMMITTING_PREPARED)
+		is_commit = true;
+	else if (fdwxact->status == FDW_XACT_ABORTING_PREPARED)
+		is_commit = false;
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	else if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_COMMITTING_PREPARED;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_ABORTING_PREPARED;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		is_commit = false;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction
+	 * state is neither committing or aborting. This should not
+	 * happen because we cannot determine to do commit or abort for
+	 * foreign transaction associated with the in-progress local
+	 * transaction.
+	 */
+	else
+		ereport(ERROR,
+				(errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid)));
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Resolve the foreign transaction */
+	Assert(fdw_routine->ResolveForeignTransaction);
+
+	ret = fdw_routine->ResolveForeignTransaction(state, is_commit);
+
+	if (!ret)
+	{
+		ereport(elevel,
+				(errmsg("could not %s a prepared foreign transaction on server \"%s\"",
+						is_commit ? "commit" : "rollback", server->servername),
+				 errdetail("local transaction id is %u, connected by user id %u",
+						   fdwxact->local_xid, fdwxact->userid)));
+	}
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdw_xact(fdwxact);
+	LWLockRelease(FdwXactLock);
+
+	return ret;
+}
+
+static FdwXactResolveState *
+create_fdw_xact_resovle_state(void)
+{
+	FdwXactResolveState *state;
+
+	state = palloc(sizeof(FdwXactResolveState));
+	state->wait_xid = InvalidTransactionId;
+	state->serverid = InvalidOid;
+	state->userid = InvalidOid;
+	state->umid = InvalidOid;
+	state->fdwxact_id = NULL;
+	state->fdw_state = NULL;
+
+	return state;
+}
+
+/*
+ * Return one FdwXact entry that matches to given arguments, otherwise
+ * return NULL. Since this function search FdwXact entry by unique key
+ * all arguments should be valid.
+ */
+static FdwXact
+get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				 bool need_lock)
+{
+	List	*fdw_xact_list;
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, need_lock);
+
+	/* Could not find entry */
+	if (fdw_xact_list == NIL)
+		return NULL;
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdw_xact_list) == 1);
+
+	return (FdwXact) linitial(fdw_xact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdw_xact_exists(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdw_xact_list;
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, true);
+
+	return fdw_xact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_prepared_fdw_xacts.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdw_xacts(int *length)
+{
+	List		*all_fdw_xacts;
+	ListCell	*lc;
+	FdwXact		fdw_xacts;
+	int			num_fdw_xacts = 0;
+
+	Assert(length != NULL);
+
+	/* Get all entries */
+	all_fdw_xacts = get_fdw_xacts(InvalidOid, InvalidTransactionId,
+								  InvalidOid, InvalidOid, true);
+
+	if (all_fdw_xacts == NIL)
+	{
+		*length = 0;
+		return NULL;
+	}
+
+	fdw_xacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdw_xacts));
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdw_xacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdw_xacts + num_fdw_xacts, fx,
+			   sizeof(FdwXactData));
+		num_fdw_xacts++;
+	}
+
+	*length = num_fdw_xacts;
+	list_free(all_fdw_xacts);
+
+	return fdw_xacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return
+ * NIL.
+ */
+static List*
+get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			  bool need_lock)
+{
+	int i;
+	List	*fdw_xact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact	fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool	matches = true;
+
+		/* xid */
+		if (xid != InvalidTransactionId && xid != fdw_xact->local_xid)
+			matches = false;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdw_xact->dbid != dbid)
+			matches = false;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdw_xact->serverid)
+			matches = false;
+
+		/* userid */
+		if (OidIsValid(userid) && fdw_xact->userid != userid)
+			matches = false;
+
+		/* Append it if matched */
+		if (matches)
+			fdw_xact_list = lappend(fdw_xact_list, fdw_xact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdw_xact_list;
+}
+
+/*
+ * fdw_xact_redo
+ * Apply the redo log for a foreign transaction.
+ */
+void
+fdw_xact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDW_XACT_REMOVE)
+	{
+		xl_fdw_xact_remove *record = (xl_fdw_xact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. Returned string
+ * value is used to identify foreign transaction. The identifier should not
+ * be same as any other concurrent prepared transaction identifier.
+ *
+ * To make the foreign transactionid, we should ideally use something like
+ * UUID, which gives unique ids with high probability, but that may be expensive
+ * here and UUID extension which provides the function to generate UUID is
+ * not part of the core code.
+ */
+static char *
+generate_fdw_xact_identifier(Oid serverid, Oid userid)
+{
+	char*	fdw_xact_id;
+
+	fdw_xact_id = (char *)palloc(FDW_XACT_ID_MAX_LEN * sizeof(char));
+
+	snprintf(fdw_xact_id, FDW_XACT_ID_MAX_LEN, "%s_%ld_%d_%d",
+			 "fx", Abs(random()), serverid, userid);
+	fdw_xact_id[strlen(fdw_xact_id)] = '\0';
+
+	return fdw_xact_id;
+}
+
+/*
+ * CheckPointFdwXact
+ *
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * In order to avoid disk I/O while holding a light weight lock, the function
+ * first collects the files which need to be synced under FdwXactLock and then
+ * syncs them after releasing the lock. This approach creates a race condition:
+ * after releasing the lock, and before syncing a file, the corresponding
+ * foreign transaction entry and hence the file might get removed. The function
+ * checks whether that's true and ignores the error if so.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdw_xacts = 0;
+
+	/* Quick get-away, before taking lock */
+	if (max_prepared_foreign_xacts <= 0)
+		return;
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Another quick, before we allocate memory */
+	if (FdwXactCtl->numFdwXacts <= 0)
+	{
+		LWLockRelease(FdwXactLock);
+		return;
+	}
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		FdwXact		fxact = FdwXactCtl->fdw_xacts[cnt];
+
+		if ((fxact->valid || fxact->inredo) &&
+			!fxact->ondisk &&
+			fxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fxact->dbid, fxact->local_xid,
+								fxact->serverid, fxact->userid,
+								buf, len);
+			fxact->ondisk = true;
+			fxact->insert_start_lsn = InvalidXLogRecPtr;
+			fxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdw_xacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDW_XACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdw_xacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdw_xacts,
+							 serialized_fdw_xacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDW_XACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDW_XACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * ProcessFdwXactBuffer
+ *
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid = ShmemVariableCache->nextXid;
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(insert_start_lsn != InvalidXLogRecPtr);
+
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid, true);
+		if (buf == NULL)
+		{
+			ereport(WARNING,
+					(errmsg("removing corrupt fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+			return NULL;
+		}
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return thecontents in
+ * a structure allocated in-memory. Otherwise return NULL. The structure can
+ * be later freed by the caller.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				bool give_warnings)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+	{
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not stat FDW transaction state file \"%s\": %m",
+							path)));
+		return NULL;
+	}
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdw_xact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+	{
+		CloseTransientFile(fd);
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+		return NULL;
+	}
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+	{
+		CloseTransientFile(fd);
+		return NULL;
+	}
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_READ);
+	if (read(fd, buf, stat.st_size) != stat.st_size)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read FDW transaction state file \"%s\": %m",
+					  path)));
+		return NULL;
+	}
+
+	pgstat_report_wait_end();
+	CloseTransientFile(fd);
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+	{
+		pfree(buf);
+		return NULL;
+	}
+
+	/* Check if the contents is an expected data */
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fxact_file_data->dbid  != dbid ||
+		fxact_file_data->serverid != serverid ||
+		fxact_file_data->userid != userid ||
+		fxact_file_data->local_xid != xid)
+	{
+		ereport(WARNING,
+			(errmsg("invalid foreign transaction state file \"%s\"",
+					path)));
+		CloseTransientFile(fd);
+		pfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/*
+ * PrescanFdwXacts
+ *
+ * Scan the all foreign transactions directory for oldest active transaction.
+ * This is run during database startup, after we completed reading WAL.
+ * ShmemVariableCache->nextXid has been set to one more than the highest XID
+ * for which evidence exists in WAL.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	TransactionId nextXid = ShmemVariableCache->nextXid;
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+		 strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			TransactionId local_xid;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/*
+			 * Remove a foreign prepared transaction file corresponding to an
+			 * XID, which is too new.
+			 */
+			if (TransactionIdFollowsOrEquals(local_xid, nextXid))
+			{
+				ereport(WARNING,
+						(errmsg("removing future foreign prepared transaction file \"%s\"",
+								clde->d_name)));
+				RemoveFdwXactFile(dbid, local_xid, serverid, userid, true);
+				continue;
+			}
+
+			if (TransactionIdPrecedesOrEquals(local_xid, oldestActiveXid))
+				oldestActiveXid = local_xid;
+		}
+	}
+
+	FreeDir(cldir);
+	return oldestActiveXid;
+}
+
+/*
+ * restoreFdwXactData
+ *
+ * Scan pg_fdw_xact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * FdwXactRedoAdd
+ *
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fxact = insert_fdw_xact(fxact_data->dbid, fxact_data->local_xid,
+							fxact_data->serverid, fxact_data->userid,
+							fxact_data->umid, fxact_data->fdw_xact_id);
+
+	/*
+	 * Set status as preparing, since we do not know the xact status
+	 * right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->insert_start_lsn = start_lsn;
+	fxact->insert_end_lsn = end_lsn;
+	fxact->inredo = true;	/* added in redo */
+	fxact->valid = false;
+	fxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * FdwXactRedoRemove
+ *
+ * Remove the corresponding fdw_xact entry from FdwXactCtl.
+ * Also remove fdw_xact file if a foreign transaction was saved
+ * via an earlier checkpoint.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdw_xact(dbid, xid, serverid, userid,
+							   false);
+
+	if (fdwxact == NULL)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdw_xact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(bool *newval, void **extra, GucSource source)
+{
+	/* Parameter check */
+	if (*newval &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdw_xacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_prepared_fdw_xacts(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdw_xacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdw_xacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(6, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdw_xacts = get_all_fdw_xacts(&num_fdw_xacts);
+		status->num_xacts = num_fdw_xacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdw_xact = &status->fdw_xacts[status->cur_xact++];
+		Datum		values[6];
+		bool		nulls[6];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdw_xact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdw_xact->dbid);
+		values[1] = TransactionIdGetDatum(fdw_xact->local_xid);
+		values[2] = ObjectIdGetDatum(fdw_xact->serverid);
+		values[3] = ObjectIdGetDatum(fdw_xact->userid);
+		switch (fdw_xact->status)
+		{
+			case FDW_XACT_PREPARING:
+				xact_status = "prepared";
+				break;
+			case FDW_XACT_COMMITTING_PREPARED:
+				xact_status = "committing";
+				break;
+			case FDW_XACT_ABORTING_PREPARED:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		/* should this be really interpreted by FDW */
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdw_xact->fdw_xact_id,
+															 strlen(fdw_xact->fdw_xact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXactResolveState *state;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	bool			ret;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, true);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	usermapping = GetUserMapping(userid, serverid);
+
+	state = create_fdw_xact_resovle_state();
+	state->wait_xid = xid;
+	state->serverid = serverid;
+	state->userid = userid;
+	state->umid = usermapping->umid;
+
+	ret = FdwXactResolveForeignTransaction(state, fdwxact, LOG);
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, false);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	remove_fdw_xact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/access/fdwxact/fdwxact_launcher.c b/src/backend/access/fdwxact/fdwxact_launcher.c
new file mode 100644
index 0000000..39f351b
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_launcher.c
@@ -0,0 +1,641 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * There is a shared memory area where the information of resolver process
+ * is stored. Requesting of starting new resolver process by backend process
+ * is done via that shared memory area. Note that the launcher is assuming
+ * that there is no more than one starting request for a database.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launcher_sigusr2(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid, int slot);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+Datum pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS);
+
+/*
+ * Wake up the launcher process to retry launch. This is used by
+ * the resolver process is being stopped.
+ */
+void
+FdwXactLauncherWakeupToRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request resolution. This is
+ * used by the backend process.
+ */
+void
+FdwXactLauncherWakeupToRequest(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactActiveQueue));
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactRetryQueue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR1: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always try to start by the backend request.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			ResetLatch(MyLatch);
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher launch",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver worker
+ * if not running yet. A foreign transaction resolver worker is responsible
+ * for resolution of foreign transaction that are registered on a database.
+ * So if a resolver worker already is launched, we don't need to launch new
+ * one.
+ */
+void
+fdwxact_maybe_launch_resolver(bool ignore_error)
+{
+	FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->pid != InvalidPid &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * If we found the resolver for my database, we don't need to launch new
+	 * one but wake running worker up.
+	 */
+	if (found)
+	{
+		SetLatch(resolver->latch);
+
+		elog(DEBUG1, "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		return;
+	}
+
+	/* Looking for unused resolver slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	/*
+	 * However if there are no more free worker slots, inform user about it before
+	 * exiting.
+	 */
+	if (!found)
+	{
+		LWLockRelease(FdwXactResolverLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+		return;
+	}
+
+	Assert(resolver->pid == InvalidPid);
+
+	/* Found a new resolver process */
+	resolver->dbid = MyDatabaseId;
+	resolver->in_use = true;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Wake up launcher */
+	FdwXactLauncherWakeupToRequest();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' at 'slot' if given. If slot is negative value we find an unused slot.
+ * Note that caller must hold FdwXactResolverLock in exclusive mode.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid, int slot)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int launch_slot = slot;
+
+	/* If slot number is invalid, we find an unused slot */
+	if (launch_slot < 0)
+	{
+		int i;
+
+		for (i = 0; i < max_foreign_xact_resolvers; i++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+			if (resolver->in_use && resolver->dbid == dbid)
+				return;
+
+			if (!resolver->in_use)
+			{
+				launch_slot = i;
+				break;
+			}
+		}
+	}
+
+	/* No unused found */
+	if (launch_slot < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[launch_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_main_arg = Int32GetDatum(launch_slot);
+	bgw.bgw_notify_pid = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch all foreign transaction resolvers that are required by backend process
+ * but not running. Return true if we launch any resolver.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	int i, j;
+	int num_launches = 0;
+	int num_unused_slots = 0;
+	int num_dbs = 0;
+	bool launched = false;
+	Oid	*dbs_to_launch;
+	Oid	*dbs_having_worker = palloc0(sizeof(Oid) * max_foreign_xact_resolvers);
+
+	/*
+	 * Launch resolver workers on the databases that are requested
+	 * by backend processes while looking unused slots.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* Remember unused worker slots */
+		if (!resolver->in_use)
+		{
+			num_unused_slots++;
+			continue;
+		}
+
+		/* Remember databases that are having a resolve worker, fall through */
+		if (OidIsValid(resolver->dbid))
+			dbs_having_worker[num_dbs++] = resolver->dbid;
+
+		/* Launch the backend-requested worker */
+		if (resolver->in_use &&
+			OidIsValid(resolver->dbid) &&
+			resolver->pid == InvalidPid)
+		{
+			fdwxact_launch_resolver(resolver->dbid, i);
+			launched = true;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* quick exit if no unused slot */
+	if (num_unused_slots == 0)
+		return launched;
+
+	/*
+	 * Launch the stopped resolver on the database that has unresolved
+	 * foreign transaction but doesn't have any resolver. Scanning
+	 * all FdwXact entries could take time but it's harmless for the
+	 * relaunch case.
+	 */
+	dbs_to_launch = (Oid *) palloc(sizeof(Oid) * num_unused_slots);
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool found = false;
+
+		/* unused slot is full */
+		if (num_launches > num_unused_slots)
+			break;
+
+		for (j = 0; j < num_dbs; j++)
+		{
+			if (dbs_having_worker[j] == fdw_xact->dbid)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		/* Register the database if any resolvers aren't working on that */
+		if (!found)
+			dbs_to_launch[num_launches++] = fdw_xact->dbid;
+	}
+
+	/* Launch resolver process for a database at any worker slot */
+	for (i = 0; i < num_launches; i++)
+	{
+		fdwxact_launch_resolver(dbs_to_launch[i], -1);
+		launched = true;
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+
+/*
+ * Returns activity of foreign transaction resolvers, including pids, the number
+ * of tasks and the last resolution time.
+ */
+Datum
+pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/fdwxact_resolver.c b/src/backend/access/fdwxact/fdwxact_resolver.c
new file mode 100644
index 0000000..0b754da
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_resolver.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for every databases.
+ *
+ * A resolver process continues to resolve foreign transactions on a database
+ * It resolves two types of foreign transactions: on-line foreign transaction
+ * and dangling foreign transaction. The on-line foreign transaction is a
+ * foreign transaction that a concurrent backend process is waiting for
+ * resolution. The dangling transaction is a foreign transaction that corresponding
+ * distributed transaction ended up in in-doubt state. A resolver process
+ * doesn' exit as long as there is at least one unresolved foreign transaction
+ * on the database even if the timeout has come.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+
+//static MemoryContext ResolveContext = NULL;
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FdwXactRslvLoop(void);
+static long FdwXactRslvComputeSleepTime(TimestampTz now);
+static void FdwXactRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+	FdwXactLauncherWakeupToRetry();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	TIMESTAMP_NOBEGIN(MyFdwXactResolver->last_resolved_time);
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactRslvLoop(void)
+{
+	TimestampTz last_retry_time = 0;
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time;
+		bool		resolved;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve one distributed transaction */
+		StartTransactionCommand();
+		resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, true);
+		CommitTransactionCommand();
+
+		now = GetCurrentTimestamp();
+
+		/* Update my state */
+		if (resolved)
+			MyFdwXactResolver->last_resolved_time = now;
+
+		if (TimestampDifferenceExceeds(last_retry_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			StartTransactionCommand();
+			resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, false);
+			CommitTransactionCommand();
+
+			last_retry_time = GetCurrentTimestamp();
+
+			/* Update my state */
+			if (resolved)
+				MyFdwXactResolver->last_resolved_time = last_retry_time;
+		}
+
+		/* Check for fdwxact resolver timeout */
+		FdwXactRslvCheckTimeout(now);
+
+		/*
+		 * If we have resolved any distributed transaction we go the next
+		 * without both resolving dangling transaction and sleeping because
+		 * there might be other on-line transactions waiting to be resolved.
+		 */
+		if (!resolved)
+		{
+			/* Resolve dangling transactions as mush as possible */
+			StartTransactionCommand();
+			FdwXactResolveAllDanglingTransactions(MyDatabaseId);
+			CommitTransactionCommand();
+
+			sleep_time = FdwXactRslvComputeSleepTime(now);
+
+			MemoryContextResetAndDeleteChildren(resolver_ctx);
+			MemoryContextSwitchTo(TopMemoryContext);
+
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   sleep_time,
+						   WAIT_EVENT_FDW_XACT_RESOLVER_MAIN);
+
+			if (rc & WL_POSTMASTER_DEATH)
+				proc_exit(1);
+		}
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/*
+	 * Reached to the timeout. We exit if there is no more both pending on-line
+	 * transactions and dangling transactions.
+	 */
+	if (!fdw_xact_exists(InvalidTransactionId, MyDatabaseId, InvalidOid,
+						 InvalidOid))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyFdwXactResolver->dbid))));
+		CommitTransactionCommand();
+
+		fdwxact_resolver_detach();
+		proc_exit(0);
+	}
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. Return the sleep time
+ * in milliseconds, -1 means that we reached to the timeout and should exits
+ */
+static long
+FdwXactRslvComputeSleepTime(TimestampTz now)
+{
+	static TimestampTz	wakeuptime = 0;
+	long	sleeptime;
+	long	sec_to_timeout;
+	int		microsec_to_timeout;
+
+	if (now >= wakeuptime)
+		wakeuptime = TimestampTzPlusMilliseconds(now,
+												 foreign_xact_resolution_retry_interval);
+
+	/* Compute relative time until wakeup. */
+	TimestampDifference(now, wakeuptime,
+						&sec_to_timeout, &microsec_to_timeout);
+
+	sleeptime = sec_to_timeout * 1000 + microsec_to_timeout / 1000;
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..7061bba
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdw_xactdesc.c
+ *		PostgreSQL distributed transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdw_xactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdw_xact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		FdwXactOnDiskData *fdw_insert_xlog = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_insert_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_insert_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_insert_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_insert_xlog->local_xid);
+		/* TODO: This should be really interpreted by each FDW */
+
+		/*
+		 * TODO: we also need to assess whether we want to add this
+		 * information
+		 */
+		appendStringInfo(buf, " foreign transaction info: %s",
+						 fdw_insert_xlog->fdw_xact_id);
+	}
+	else
+	{
+		xl_fdw_xact_remove *fdw_remove_xlog = (xl_fdw_xact_remove *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_remove_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_remove_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_remove_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_remove_xlog->xid);
+	}
+
+}
+
+const char *
+fdw_xact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDW_XACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDW_XACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7..4a9ab3d 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,14 +112,16 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_prepared_xacts=%d max_locks_per_xact=%d "
 						 "wal_level=%s wal_log_hints=%s "
-						 "track_commit_timestamp=%s",
+						 "track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3942734..bc4e109 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -844,6 +845,35 @@ TwoPhaseGetGXact(TransactionId xid)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyProc
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2316,6 +2346,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2375,6 +2411,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8c1621d..b2b3c89 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1131,6 +1132,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_twophase_for_ac;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1139,6 +1141,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_twophase_for_ac = ForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1177,12 +1180,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_twophase_for_ac)
 			goto cleanup;
 	}
 	else
@@ -1340,6 +1344,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_twophase_for_ac && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -1990,6 +2002,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2146,6 +2161,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2233,6 +2249,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2422,6 +2440,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2627,6 +2646,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7375a78..9de3bcc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5267,6 +5268,7 @@ BootStrapXLOG(void)
 	ControlFile->MaxConnections = MaxConnections;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6354,6 +6356,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6878,14 +6883,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdw_xact, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7077,7 +7083,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7583,6 +7592,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7901,6 +7911,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9217,6 +9230,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9650,7 +9664,8 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9682,6 +9697,7 @@ XLogReportParameters(void)
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9887,6 +9903,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10085,6 +10102,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->MaxConnections = xlrec.MaxConnections;
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a03b005..47e9317 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_prepared_fdw_xacts AS
+       SELECT * FROM pg_prepared_fdw_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
 	l.objoid, l.classoid, l.objsubid,
@@ -773,6 +776,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_fdwxact_resolvers AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_fdwxact_resolver() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index e5dd995..dac1e3a 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -1093,6 +1094,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1407,6 +1420,16 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srv->serverid,
+						useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0bcb237..e063922 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
@@ -749,7 +750,10 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+		FdwXactMarkForeignTransactionModified(partRelInfo, 0);
+	}
 
 	MemoryContextSwitchTo(oldContext);
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 2ec7fcb..a16e1e4 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,7 +226,13 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+
+		/* Mark this transaction modified data on the foreign server */
+		FdwXactMarkForeignTransactionModified(estate->es_result_relation_info,
+										 eflags);
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 528f587..4ef20b7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "commands/trigger.h"
@@ -44,6 +45,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -2321,6 +2323,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 fdw_private,
 															 i,
 															 eflags);
+
+			/* Mark this transaction modified data on the foreign server */
+			FdwXactMarkForeignTransactionModified(resultRelInfo, eflags);
 		}
 
 		resultRelInfo++;
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index a0bcc04..b2097ad 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -155,6 +155,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d2b695e..b722b9a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8de603d..3f59fdd 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3484,6 +3484,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
 			event_name = "LogicalLauncherMain";
 			break;
@@ -3675,6 +3681,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3890,6 +3899,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDW_XACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 688f462..883ad85 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -896,6 +898,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -971,12 +977,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afb4972..960fd6a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDW_XACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..c5610ee 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -150,6 +152,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, BackendRandomShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +274,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	BackendRandomShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 908f62d..cc578b2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -90,6 +90,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -245,6 +247,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1323,6 +1326,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
 	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	volatile TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1384,6 +1388,7 @@ GetOldestXmin(Relation rel, int flags)
 	/* fetch into volatile var while ProcArrayLock is held */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1434,6 +1439,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDW_XACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3016,6 +3030,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..a42d06e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,5 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+FdwXactLock					46
+FdwXactResolverLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa5..8e55dad 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -398,6 +399,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -799,6 +804,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6e13d14..1302d16 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -2971,6 +2973,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2317e8b..0c371fe 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/transam.h"
@@ -659,6 +660,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -1831,6 +1836,16 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+			gettext_noop("Sets the usage of two-phase commit protocol for distributed transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		false,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -2235,6 +2250,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4e61bc6..88cdc85 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -121,6 +121,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -287,6 +289,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index ad06e8e..ca3eb62 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ab5cb7f..609578c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -209,6 +209,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdw_xact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 895a51f..7df88e0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -306,6 +306,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_worker_processes);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 6fb403a..6d867c8 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -730,6 +730,7 @@ GuessControlValues(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -957,6 +958,7 @@ RewriteControlFile(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* Contents are protected with a CRC */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000..9563a1e
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,150 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL distributed transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDW_XACT_H
+#define FDW_XACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+#define	FDW_XACT_NOT_WAITING		0
+#define	FDW_XACT_WAITING			1
+#define	FDW_XACT_WAITING_RETRY		2
+#define	FDW_XACT_WAIT_COMPLETE		3
+
+#define FdwXactEnabled() (max_prepared_foreign_xacts > 0)
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDW_XACT_ID_MAX_LEN 200
+
+/* Enum to track the status of prepared foreign transaction */
+typedef enum
+{
+	FDW_XACT_INITIAL,
+	FDW_XACT_PREPARING,					/* foreign transaction is being prepared */
+	FDW_XACT_PREPARED,					/* foreign transaction is prepared */
+	FDW_XACT_COMMITTING_PREPARED,		/* foreign prepared transaction is to
+										 * be committed */
+	FDW_XACT_ABORTING_PREPARED, /* foreign prepared transaction is to be
+								 * aborted */
+} FdwXactStatus;
+
+/* Shared memory entry for a prepared or being prepared foreign transaction */
+typedef struct FdwXactData *FdwXact;
+
+typedef struct FdwXactData
+{
+	FdwXact		fxact_free_next;	/* Next free FdwXact entry */
+	FdwXact		fxact_next;			/* Pointer to the neext FdwXact entry accosiated
+									 * with the same transaction */
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes place */
+	Oid				userid;			/* user who initiated the foreign transaction */
+	Oid				umid;
+	FdwXactStatus 	status;			/* The state of the foreign transaction. This
+									 * doubles as the action to be taken on this entry. */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/* Shared memory layout for maintaining foreign prepared transaction entries. */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		freeFdwXacts;
+
+	/* Number of valid foreign transaction entries */
+	int			numFdwXacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdw_xacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* Struct for foreign transaction passed to API */
+typedef struct FdwXactResolveState
+{
+	Oid		serverid;
+	Oid		userid;
+	Oid		umid;
+
+	/*
+	 * The following fields are used only for COMMIT/ROLLBACK PREPARED
+	 * and PREPARE callbacks.
+	 */
+	char			*fdwxact_id;
+	void			*fdw_state;		/* foreign-data wrapper can keep state here */
+
+	/*
+	 * The following fields are used only for COMMIT/ROLLBACK PREPARED
+	 * callbacks.
+	 */
+	TransactionId	wait_xid;
+} FdwXactResolveState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern bool foreign_twophase_commit;
+
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdw_xact_exists(TransactionId xid, Oid dboid, Oid serverid,
+				Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactResolveDistributedTransaction(Oid dbid, bool is_active);
+extern void FdwXactResolveAllDanglingTransactions(Oid dbid);
+extern bool ForeignTwophaseCommitRequired(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void FdwXactMarkForeignTransactionModified(ResultRelInfo *resultRelInfo,
+												  int flags);
+extern bool check_foreign_twophase_commit(bool *newval, void **extra,
+										  GucSource source);
+
+#endif   /* FDW_XACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000..4ea65b2
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _FDWXACT_LAUNCHER_H
+#define _FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherWakeupToRequest(void);
+extern void FdwXactLauncherWakeupToRetry(void);
+
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+
+extern bool IsFdwXactLauncher(void);
+
+extern void fdwxact_maybe_launch_resolver(bool ignore_error);
+
+
+#endif	/* _FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000..6b2a24f
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000..e92b5a1
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDW_XACT_INSERT	0x00
+#define XLOG_FDW_XACT_REMOVE	0x10
+
+/* Same as GIDSIZE */
+#define FDW_XACT_MAX_ID_LEN 200
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdw_xact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+} xl_fdw_xact_remove;
+
+extern void fdw_xact_redo(XLogReaderState *record);
+extern void fdw_xact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdw_xact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000..36391d4
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,67 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _RESOLVER_INTERNAL_H
+#define _RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/*
+	 * Foreign transaction resolution queues. Protected by FdwXactLock.
+	 */
+	SHM_QUEUE	FdwXactActiveQueue;
+	SHM_QUEUE	FdwXactRetryQueue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* _RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 0bbe9879..c15dff7 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDW_XACT_ID, "Foreign Transactions", fdw_xact_redo, fdw_xact_desc, fdw_xact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 0e932da..b199c88 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 2c1b2d8..63c833d 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -105,6 +105,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE				(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 30610b3..795e85a 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -227,6 +227,7 @@ typedef struct xl_parameter_change
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6..3d5333a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -178,6 +178,7 @@ typedef struct ControlFileData
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cff58ed..d39ca1e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5032,6 +5032,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_fdwxact_resolver', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_fdwxact_resolver' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5737,6 +5744,22 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_prepared_fdw_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{dbid,transaction,serverid,userid,status,identifier}',
+  prosrc => 'pg_prepared_fdw_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction',
+  proname => 'pg_remove_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_remove_fdw_xact' },
+{ oid => '6052', descr => 'resolve foreign transaction',
+  proname => 'pg_resolve_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_resolve_fdw_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb54..c5e481a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
@@ -168,6 +169,12 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef bool (*PrepareForeignTransaction_function) (FdwXactResolveState *state);
+typedef bool (*CommitForeignTransaction_function) (FdwXactResolveState *state);
+typedef bool (*RollbackForeignTransaction_function) (FdwXactResolveState *state);
+typedef bool (*ResolveForeignTransaction_function) (FdwXactResolveState *state,
+													bool is_commit);
+typedef bool (*IsTwoPhaseCommitEnabled_function) (Oid serverid);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -235,6 +242,13 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for distributed transactions */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	ResolveForeignTransaction_function ResolveForeignTransaction;
+	IsTwoPhaseCommitEnabled_function IsTwoPhaseCommitEnabled;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -247,7 +261,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
@@ -258,4 +271,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in foreign/fdwxact.c */
+extern void FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, char *fdwxact_id);
+
 #endif							/* FDWAPI_H */
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e6..d030368 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -68,10 +68,10 @@ typedef struct ForeignTable
 	List	   *options;		/* ftoptions as DefElem list */
 } ForeignTable;
 
-
 extern ForeignServer *GetForeignServer(Oid serverid);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
 							bool missing_ok);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d59c24a..f74d1be 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,6 +759,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDW_XACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -832,7 +834,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDW_XACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -912,6 +915,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_READ,
+	WAIT_EVENT_FDW_XACT_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index cb613c8..45880b2 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -153,6 +153,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction
+								 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 75bab29..25d6a2f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDW_XACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..81560bd 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -94,6 +94,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 735dd37..fdd6ded 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1413,6 +1413,13 @@ pg_policies| SELECT n.nspname AS schemaname,
    FROM ((pg_policy pol
      JOIN pg_class c ON ((c.oid = pol.polrelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+pg_prepared_fdw_xacts| SELECT f.dbid,
+    f.transaction,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.identifier
+   FROM pg_prepared_fdw_xacts() f(dbid, transaction, serverid, userid, status, identifier);
 pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
@@ -1821,6 +1828,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_fdwxact_resolvers| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_fdwxact_resolver() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.10.5

#11Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Masahiko Sawada (#10)

Hello.

# It took a long time to come here..

At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

...

* Updated docs, added the new section "Distributed Transaction" at
Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact directory.

* Some bug fixes.

Please reivew them.

I have some comments, with apologize in advance for possible
duplicate or conflict with others' comments so far.

0001:

This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
relation is modified. Isn't it needed when UNLOGGED tables are
modified? It may be better that we have dedicated classification
macro or function.

The flag is handled in heapam.c. I suppose that it should be done
in the upper layer considering coming pluggable storage.
(X_F_ACCESSEDTEMPREL is set in heapam, but..)

0002:

The name FdwXactParticipantsForAC doesn't sound good for me. How
about FdwXactAtomicCommitPartitcipants?

Well, as the file comment of fdwxact.c,
FdwXactRegisterTransaction is called from FDW driver and
F_X_MarkForeignTransactionModified is called from executor. I
think that we should clarify who is responsible to the whole
sequence. Since the state of local tables affects, I suppose
executor is that. Couldn't we do the whole thing within executor
side? I'm not sure but I feel that
F_X_RegisterForeignTransaction can be a part of
F_X_MarkForeignTransactionModified. The callers of
MarkForeignTransactionModified can find whether the table is
involved in 2pc by IsTwoPhaseCommitEnabled interface.

if (foreign_twophase_commit == true &&
((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));

The error is emitted when a the GUC is turned off in the
trasaction where MarkTransactionModify'ed. I think that the
number of the variables' possible states should be reduced for
simplicity. For example in the case, once foreign_twopase_commit
is checked in a transaction, subsequent changes in the
transaction should be ignored during the transaction.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro HORIGUCHI (#11)

On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

# It took a long time to come here..

At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

...

* Updated docs, added the new section "Distributed Transaction" at
Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact directory.

* Some bug fixes.

Please reivew them.

I have some comments, with apologize in advance for possible
duplicate or conflict with others' comments so far.

Thank youf so much for reviewing this patch!

0001:

This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
relation is modified. Isn't it needed when UNLOGGED tables are
modified? It may be better that we have dedicated classification
macro or function.

I think even if we do atomic commit for modifying the an UNLOGGED
table and a remote table the data will get inconsistent if the local
server crashes. For example, if the local server crashes after
prepared the transaction on foreign server but before the local commit
and, we will lose the all data of the local UNLOGGED table whereas the
modification of remote table is rollbacked. In case of persistent
tables, the data consistency is left. So I think the keeping data
consistency between remote data and local unlogged table is difficult
and want to leave it as a restriction for now. Am I missing something?

The flag is handled in heapam.c. I suppose that it should be done
in the upper layer considering coming pluggable storage.
(X_F_ACCESSEDTEMPREL is set in heapam, but..)

Yeah, or we can set the flag after heap_insert in ExecInsert.

0002:

The name FdwXactParticipantsForAC doesn't sound good for me. How
about FdwXactAtomicCommitPartitcipants?

+1, will fix it.

Well, as the file comment of fdwxact.c,
FdwXactRegisterTransaction is called from FDW driver and
F_X_MarkForeignTransactionModified is called from executor. I
think that we should clarify who is responsible to the whole
sequence. Since the state of local tables affects, I suppose
executor is that. Couldn't we do the whole thing within executor
side? I'm not sure but I feel that
F_X_RegisterForeignTransaction can be a part of
F_X_MarkForeignTransactionModified. The callers of
MarkForeignTransactionModified can find whether the table is
involved in 2pc by IsTwoPhaseCommitEnabled interface.

Indeed. We can register foreign servers by executor while FDWs don't
need to register anything. I will remove the registration function so
that FDW developers don't need to call the register function but only
need to provide atomic commit APIs.

if (foreign_twophase_commit == true &&
((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));

The error is emitted when a the GUC is turned off in the
trasaction where MarkTransactionModify'ed. I think that the
number of the variables' possible states should be reduced for
simplicity. For example in the case, once foreign_twopase_commit
is checked in a transaction, subsequent changes in the
transaction should be ignored during the transaction.

I might have not gotten your comment correctly but since the
foreign_twophase_commit is a PGC_USERSET parameter I think we need to
check it at commit time. Also we need to keep participant servers even
when foreign_twophase_commit is off if both max_prepared_foreign_xacts
and max_foreign_xact_resolvers are > 0.

I will post the updated patch in this week.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#13Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#12)
4 attachment(s)

On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

# It took a long time to come here..

At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

...

* Updated docs, added the new section "Distributed Transaction" at
Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact directory.

* Some bug fixes.

Please reivew them.

I have some comments, with apologize in advance for possible
duplicate or conflict with others' comments so far.

Thank youf so much for reviewing this patch!

0001:

This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
relation is modified. Isn't it needed when UNLOGGED tables are
modified? It may be better that we have dedicated classification
macro or function.

I think even if we do atomic commit for modifying the an UNLOGGED
table and a remote table the data will get inconsistent if the local
server crashes. For example, if the local server crashes after
prepared the transaction on foreign server but before the local commit
and, we will lose the all data of the local UNLOGGED table whereas the
modification of remote table is rollbacked. In case of persistent
tables, the data consistency is left. So I think the keeping data
consistency between remote data and local unlogged table is difficult
and want to leave it as a restriction for now. Am I missing something?

The flag is handled in heapam.c. I suppose that it should be done
in the upper layer considering coming pluggable storage.
(X_F_ACCESSEDTEMPREL is set in heapam, but..)

Yeah, or we can set the flag after heap_insert in ExecInsert.

0002:

The name FdwXactParticipantsForAC doesn't sound good for me. How
about FdwXactAtomicCommitPartitcipants?

+1, will fix it.

Well, as the file comment of fdwxact.c,
FdwXactRegisterTransaction is called from FDW driver and
F_X_MarkForeignTransactionModified is called from executor. I
think that we should clarify who is responsible to the whole
sequence. Since the state of local tables affects, I suppose
executor is that. Couldn't we do the whole thing within executor
side? I'm not sure but I feel that
F_X_RegisterForeignTransaction can be a part of
F_X_MarkForeignTransactionModified. The callers of
MarkForeignTransactionModified can find whether the table is
involved in 2pc by IsTwoPhaseCommitEnabled interface.

Indeed. We can register foreign servers by executor while FDWs don't
need to register anything. I will remove the registration function so
that FDW developers don't need to call the register function but only
need to provide atomic commit APIs.

if (foreign_twophase_commit == true &&
((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));

The error is emitted when a the GUC is turned off in the
trasaction where MarkTransactionModify'ed. I think that the
number of the variables' possible states should be reduced for
simplicity. For example in the case, once foreign_twopase_commit
is checked in a transaction, subsequent changes in the
transaction should be ignored during the transaction.

I might have not gotten your comment correctly but since the
foreign_twophase_commit is a PGC_USERSET parameter I think we need to
check it at commit time. Also we need to keep participant servers even
when foreign_twophase_commit is off if both max_prepared_foreign_xacts
and max_foreign_xact_resolvers are > 0.

I will post the updated patch in this week.

Attached the updated version patches.

Based on the review comment from Horiguchi-san, I've changed the
atomic commit API so that the FDW developer who wish to support atomic
commit don't need to call the register function. The atomic commit
APIs are following:

* GetPrepareId
* PrepareForeignTransaction
* CommitForeignTransaction
* RollbackForeignTransaction
* ResolveForeignTransaction
* IsTwophaseCommitEnabled

The all APIs except for GetPreapreId is required for atomic commit.

Also, I've changed the foreign_twophase_commit parameter to an enum
parameter based on the suggestion from Robert[1]/messages/by-id/CA+Tgmob4EqxbaMp0e--jUKYT44RL4xBXkPMxF9EEAD+yBGAdxw@mail.gmail.com. Valid values are
'required', 'prefer' and 'disabled' (default). When set to either
'required' or 'prefer' the atomic commit will be used. The difference
between 'required' and 'prefer' is that when set to 'requried' we
require for *all* modified server to be able to use 2pc whereas when
'prefer' we require 2pc where available. So if any of written
participants disables 2pc or doesn't support atomic comit API the
transaction fails. IOW, when 'required' we can commit only when data
consistency among all participant can be left.

Please review the patches.

[1]: /messages/by-id/CA+Tgmob4EqxbaMp0e--jUKYT44RL4xBXkPMxF9EEAD+yBGAdxw@mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v20-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/x-patch; name=v20-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 555fec86f082a092725fbae1c85a4e00d70f8539 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 8 Feb 2018 11:26:46 +0900
Subject: [PATCH v20 1/4] Keep track of writing on non-temporary relation.

---
 src/backend/access/heap/heapam.c | 12 ++++++++++++
 src/include/access/xact.h        |  5 +++++
 2 files changed, 17 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb63471..c2db19b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2629,6 +2629,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleGetOid(tup);
 }
 
@@ -3453,6 +3457,10 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4403,6 +4411,10 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 689c57c..2c1b2d8 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -98,6 +98,11 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
-- 
2.10.5

v20-0003-postgres_fdw-supports-atomic-commit-APIs.patchapplication/x-patch; name=v20-0003-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 179365b51680f8b671720e1a56a13ebb0df0c438 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v20 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/connection.c              | 673 ++++++++++++++++---------
 contrib/postgres_fdw/expected/postgres_fdw.out | 387 +++++++++++++-
 contrib/postgres_fdw/option.c                  |   5 +-
 contrib/postgres_fdw/postgres_fdw.c            |  60 ++-
 contrib/postgres_fdw/postgres_fdw.h            |  11 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 151 +++++-
 doc/src/sgml/postgres-fdw.sgml                 |  37 ++
 7 files changed, 1069 insertions(+), 255 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fe4893a..3264300 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -14,9 +14,12 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -45,7 +48,7 @@
  */
 typedef Oid ConnCacheKey;
 
-typedef struct ConnCacheEntry
+struct ConnCacheEntry
 {
 	ConnCacheKey key;			/* hash key (must be first) */
 	PGconn	   *conn;			/* connection to foreign server, or NULL */
@@ -56,9 +59,19 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		am_participant_of_ac;	/* true if fdwxact code control the transaction */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
-} ConnCacheEntry;
+};
+
+typedef struct PgFdwXactState
+{
+	Oid		serverid;
+	Oid		userid;
+	Oid		umid;
+	ConnCacheEntry	*conn;
+} PgFdwXactState;
 
 /*
  * Connection cache (initialized on first use)
@@ -69,17 +82,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
 					   SubTransactionId parentSubid,
@@ -91,24 +100,43 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	ConnCacheEntry *entry;
+	ConnCacheKey	key;
+	bool			found;
 
-/*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
- */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+
+	return entry;
+}
+
+
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +156,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,24 +163,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -182,6 +192,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping		*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +201,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->am_participant_of_ac = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +213,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,16 +229,46 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
 /*
  * Connect to remote server using specified server and user mapping properties.
+ * If the attempt to connect fails, and the caller can handle connection failure
+ * (connection_error_ok = true) return NULL, throw error otherwise.
  */
 static PGconn *
 connect_pg_server(ForeignServer *server, UserMapping *user)
@@ -265,11 +317,22 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 
 		conn = PQconnectdbParams(keywords, values, false);
 		if (!conn || PQstatus(conn) != CONNECTION_OK)
+		{
+			char	   *connmessage;
+			int			msglen;
+
+			/* libpq typically appends a newline, strip that */
+			connmessage = pstrdup(PQerrorMessage(conn));
+			msglen = strlen(connmessage);
+			if (msglen > 0 && connmessage[msglen - 1] == '\n')
+				connmessage[msglen - 1] = '\0';
+
 			ereport(ERROR,
 					(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
 					 errmsg("could not connect to server \"%s\"",
 							server->servername),
 					 errdetail_internal("%s", pchomp(PQerrorMessage(conn)))));
+		}
 
 		/*
 		 * Check that non-superuser has used password to establish connection;
@@ -414,15 +477,20 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
+	ForeignServer	*server = GetForeignServer(serverid);
 
 	/* Start main transaction if we haven't yet */
 	if (entry->xact_depth <= 0)
 	{
 		const char *sql;
 
+		/* Register the new foreign server if enabled */
+		if (server_uses_twophase_commit(server))
+			entry->am_participant_of_ac = true;
+
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
@@ -644,193 +712,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 }
 
 /*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow remote transactions that modified anything,
-					 * since it's not very reasonable to hold them open until
-					 * the prepared transaction is committed.  For the moment,
-					 * throw error unconditionally; later we might allow
-					 * read-only cases.  Note that the error will cause us to
-					 * come right back here with event == XACT_EVENT_ABORT, so
-					 * we'll clean up the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot prepare a transaction that modified remote tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
-/*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
 static void
@@ -846,10 +727,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -860,6 +737,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			return;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1193,3 +1074,327 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * The function prepares transaction on foreign server. This function
+ * is called only at the pre-commit phase of the local transaction. Since
+ * we should have the connection to the server that we are interested in
+ * we don't use serverid and userid that are necessary to get user mapping
+ * that is the key of the connection cache.
+ */
+bool
+postgresPrepareForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate;
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+	StringInfo	command;
+
+	entry = GetConnectionState(state->umid, false, false);
+	//entry = hash_search(ConnectionHash, &(state->umid), HASH_FIND, NULL);
+
+	if (!entry->xact_got_connection || !entry->conn)
+		return true;
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	rstate = (PgFdwXactState *) palloc0(sizeof(PgFdwXactState));
+	rstate->serverid = state->serverid;
+	rstate->userid = state->userid;
+	rstate->umid = state->umid;
+	rstate->conn = entry;
+	state->fdw_state = (void *)rstate;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	if (result)
+		elog(DEBUG1, "prepared foreign transaction on server %u with ID %s",
+			 state->serverid, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function commits the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresCommitForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate;
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+
+	entry = GetConnectionState(state->umid, false, false);
+
+	if (!entry->xact_got_connection || !entry->conn)
+		return true;
+
+	rstate = (PgFdwXactState *) palloc0(sizeof(PgFdwXactState));
+	rstate->serverid = state->serverid;
+	rstate->userid = state->userid;
+	rstate->umid = state->umid;
+	rstate->conn = entry;
+	state->fdw_state = (void *)rstate;
+
+	/*
+	 * If abort cleanup previously failed for this connection,
+	 * we can't issue any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	/*
+	 * If there were any errors in subtransactions, and we
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function rollbacks the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresRollbackForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate = (PgFdwXactState *) state->fdw_state;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (rstate)
+		entry = rstate->conn;
+	else
+		entry = GetConnectionCacheEntry(state->umid);
+
+	/*
+	 * In rollback local transaction, not having connection entry means that
+	 * no transaction started. So we can regard it as success.
+	 */
+	if (!entry->xact_got_connection || !entry->conn)
+		return true;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is already unsalvageable, do only the cleanup
+	 * and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return true;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+	else
+	{
+		entry->have_prep_stmt = false;
+		entry->have_error = false;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return !abort_cleanup_failure;
+}
+
+bool
+postgresResolveForeignTransaction(FdwXactState *state, bool is_commit)
+{
+	ConnCacheEntry *entry = NULL;
+	StringInfo	command;
+	bool result = false;
+	PGresult	*res;
+
+	entry = GetConnectionState(state->umid, false, false);
+
+	if (!entry->conn)
+		return false;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 state->fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		/*
+		 * The command failed, raise a warning to log the reason of failure.
+		 * We may not be in a transaction here, so raising error doesn't
+		 * help. Even if we are in a transaction, it would be the resolver
+		 * transaction, which will get aborted on raising error, thus
+		 * delaying resolution of other prepared foreign transactions.
+		 */
+		pgfdw_report_error(LOG, res, entry->conn, false, command->data);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * If we tried to COMMIT/ABORT a prepared transaction and the prepared
+		 * transaction was missing on the foreign server, it was probably
+		 * resolved by some other means. Anyway, it should be considered as resolved.
+		 */
+		result = (sqlstate == ERRCODE_UNDEFINED_OBJECT);
+	}
+	else
+		result = true;
+
+	elog(DEBUG1, "%s prepared foreign transaction on server %u with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 state->serverid,
+		 state->fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->am_participant_of_ac = false;
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 21a2ef5..15dadf4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,15 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -185,16 +207,19 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                                      List of foreign tables
- Schema |   Table    |  Server   |                   FDW options                    | Description 
---------+------------+-----------+--------------------------------------------------+-------------
- public | ft1        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft2        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft4        | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
- public | ft5        | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft6        | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft_pg_type | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
-(6 rows)
+                                         List of foreign tables
+ Schema |      Table       |  Server   |                   FDW options                    | Description 
+--------+------------------+-----------+--------------------------------------------------+-------------
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
+(9 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8650,3 +8675,345 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+\det+
+                                                 List of foreign tables
+ Schema |      Table       |  Server   |                            FDW options                            | Description 
+--------+------------------+-----------+-------------------------------------------------------------------+-------------
+ public | fpagg_tab_p1     | loopback  | (table_name 'pagg_tab_p1')                                        | 
+ public | fpagg_tab_p2     | loopback  | (table_name 'pagg_tab_p2')                                        | 
+ public | fpagg_tab_p3     | loopback  | (table_name 'pagg_tab_p3')                                        | 
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')                             | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1', use_remote_estimate 'true') | 
+ public | ft3              | loopback  | (table_name 'loct3', use_remote_estimate 'true')                  | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')                             | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type')                  | 
+ public | ftprt1_p1        | loopback  | (table_name 'fprt1_p1', use_remote_estimate 'true')               | 
+ public | ftprt1_p2        | loopback  | (table_name 'fprt1_p2')                                           | 
+ public | ftprt2_p1        | loopback  | (table_name 'fprt2_p1', use_remote_estimate 'true')               | 
+ public | ftprt2_p2        | loopback  | (table_name 'fprt2_p2', use_remote_estimate 'true')               | 
+ public | rem1             | loopback  | (table_name 'loc1')                                               | 
+ public | rem2             | loopback  | (table_name 'loc2')                                               | 
+(19 rows)
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+  srvname  
+-----------
+ loopback
+ loopback2
+ loopback3
+(3 rows)
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+ERROR:  cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  cannot prepare a transaction that modified remote tables
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 6854f1b..1f45b1c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -108,7 +108,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		 * Validate option value, when we can do so without any context.
 		 */
 		if (strcmp(def->defname, "use_remote_estimate") == 0 ||
-			strcmp(def->defname, "updatable") == 0)
+			strcmp(def->defname, "updatable") == 0 ||
+			strcmp(def->defname, "two_phase_commit") == 0)
 		{
 			/* these accept only boolean values */
 			(void) defGetBoolean(def);
@@ -177,6 +178,8 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* two phase commit support */
+		{"two_phase_commit", ForeignServerRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fd20aa9..1135046 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,8 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -359,6 +361,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
 							 void *extra);
+static bool postgresIsTwoPhaseCommitEnabled(Oid serverid);
 
 /*
  * Helper functions
@@ -452,7 +455,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -506,10 +508,29 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->ResolveForeignTransaction = postgresResolveForeignTransaction;
+	routine->IsTwoPhaseCommitEnabled = postgresIsTwoPhaseCommitEnabled;
+
 	PG_RETURN_POINTER(routine);
 }
 
 /*
+ * postgresIsTwoPhaseCommitEnabled
+ */
+static bool
+postgresIsTwoPhaseCommitEnabled(Oid serverid)
+{
+	ForeignServer	*server = GetForeignServer(serverid);
+
+
+	return server_uses_twophase_commit(server);
+}
+
+/*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
  *
@@ -1356,7 +1377,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2411,7 +2432,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2704,7 +2725,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								&retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3321,7 +3342,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4108,7 +4129,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4198,7 +4219,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4421,7 +4442,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
@@ -5803,3 +5824,26 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+/*
+ * server_uses_twophase_commit
+ * Returns true if the foreign server is configured to support 2PC.
+ */
+bool
+server_uses_twophase_commit(ForeignServer *server)
+{
+	ListCell		*lc;
+
+	/* Check the options for two phase compliance */
+	foreach(lc, server->options)
+	{
+		DefElem    *d = (DefElem *) lfirst(lc);
+
+		if (strcmp(d->defname, "two_phase_commit") == 0)
+		{
+			return defGetBoolean(d);
+		}
+	}
+	/* By default a server is not 2PC compliant */
+	return false;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 70b538e..3526923 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
@@ -110,12 +111,14 @@ typedef struct PgFdwRelationInfo
 	int			relation_index;
 } PgFdwRelationInfo;
 
+typedef struct ConnCacheEntry ConnCacheEntry;
+
 /* in postgres_fdw.c */
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -123,6 +126,11 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern bool postgresPrepareForeignTransaction(FdwXactState *state);
+extern bool postgresCommitForeignTransaction(FdwXactState *state);
+extern bool postgresRollbackForeignTransaction(FdwXactState *state);
+extern bool postgresResolveForeignTransaction(FdwXactState *state,
+											  bool is_commit);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -181,6 +189,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						List *remote_conds, List *pathkeys, bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 88c4cb4..2554c9c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,19 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -2304,7 +2331,6 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a
 
 RESET enable_partitionwise_join;
 
-
 -- ===================================================================
 -- test partitionwise aggregates
 -- ===================================================================
@@ -2354,3 +2380,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+
+\det+
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 54b5e98..f4a9ff5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
-- 
2.10.5

v20-0004-Add-regression-tests-for-atomic-commit.patchapplication/x-patch; name=v20-0004-Add-regression-tests-for-atomic-commit.patchDownload
From 0d59baa5bc23fc51aeebea53a4d644c031f0db0b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v20 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0..71c8b9d 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..a23f120
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port', two_phase_commit 'on');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port', two_phase_commit 'on');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6890678..d1b181a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2286,9 +2286,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2303,7 +2306,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v20-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/x-patch; name=v20-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 4c078f27add47d8b915524323484bc3c2e51a3e5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH v20 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |   97 +
 doc/src/sgml/config.sgml                      |  143 +-
 doc/src/sgml/distributed-transaction.sgml     |  157 ++
 doc/src/sgml/fdwhandler.sgml                  |  203 ++
 doc/src/sgml/filelist.sgml                    |    1 +
 doc/src/sgml/func.sgml                        |   51 +
 doc/src/sgml/monitoring.sgml                  |   56 +
 doc/src/sgml/postgres.sgml                    |    1 +
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/fdwxact.c          | 2678 +++++++++++++++++++++++++
 src/backend/access/fdwxact/fdwxact_launcher.c |  641 ++++++
 src/backend/access/fdwxact/fdwxact_resolver.c |  331 +++
 src/backend/access/heap/heapam.c              |   12 -
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   65 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   32 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/executor/execPartition.c          |    4 +
 src/backend/executor/nodeForeignscan.c        |    8 +
 src/backend/executor/nodeModifyTable.c        |   24 +
 src/backend/foreign/foreign.c                 |   43 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   80 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  147 ++
 src/include/access/fdwxact_launcher.h         |   32 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   52 +
 src/include/access/resolver_internal.h        |   67 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/foreign/fdwapi.h                  |   18 +-
 src/include/foreign/foreign.h                 |    2 +-
 src/include/pgstat.h                          |    8 +-
 src/include/storage/proc.h                    |   10 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |   12 +
 62 files changed, 5283 insertions(+), 40 deletions(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_launcher.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6d6fbec..9d99cdc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9622,6 +9622,103 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-prepared-fdw-xacts">
+  <title><structname>pg_prepared_fdw_xacts</structname></title>
+
+  <indexterm zone="view-pg-prepared-fdw-xacts">
+   <primary>pg_prepared_fdw_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_prepared_fdw_xacts</structname> displays
+   information about foreign transactions that are currently prepared on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="fdw-transaction-managements"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_prepared_xacts</structname> contains one row per prepared
+   foreign transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_prepared_fdw_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>transaction</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Transaction id that this foreign transaction associates with
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server that this foreign server is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction: <literal>prepared</literal>, <literal>committing</literal>, <literal>aborting</literal> or <literal>unknown</literal>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7554cba..557b3f2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3611,7 +3611,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -7827,6 +7826,148 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophase_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign transaction
+         to be resolved before the command returns a "success" indication to the client.
+         Valid values are <literal>required</literal>, <literal>prefer</literal> and
+         <literal>disabled</literal>. The default setting is <literal>disabled</literal>.
+         When <literal>disabled</literal>, there can be risk of database consistency among
+         distributed transaction if some foreign server crashes during committing the
+         distributed transaction. When set to <literal>required</literal> the distributed
+         transaction requires that all written servers can use two-phase commit protocol.
+         That is, the transaction fails if any of servers returns <literal>false</literal>
+         from <function>IsTwoPhaseCommitEnabled</function> or does not support transaction
+         management callback routines(described in
+         <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction requires
+         two-phase commit protocol where available but without failing when it is not
+         available.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero value to
+         set this parameter either <literal>required</literal> or <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one transaction
+         is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism.  You should set this value to
+         zero only if you set <varname>max_foreign_transaction_resolvers</varname> as
+         much as databases you have. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..5143499
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,157 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction Management</title>
+
+ <para>
+  This chapter explains what distributed transaction management is, and how it can be configured
+  in PostgreSQL.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit is an operation that applies a set of changes as a single operation
+   globally. <productname>PostgreSQL</productname> provides a way to perform a transaction
+   with foreign resources using <literal>Foreign Data Wrapper</literal>. Using the
+   <productname>PostgreSQL</productname>'s atomic commit ensures that all changes
+   on foreign servers end in either commit or rollback using the transaction callback
+   routines (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs Two-phase commit protocol, which is a
+    type of atomic commitment protocol (ACP). Using Two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed transaction manager
+    prepares all transaction on the foreign servers if two-phase commit is required.
+    Two-phase commit is required only if the transaction modifies data on two or more
+    servers including the local server itself and user requests it by
+    <xref linkend="guc-foreign-twophase-commit"/>. If all preparations on foreign servers
+    got successful go to the next step. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes over rollback, then rollback all transactions
+    on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the transaction
+    locally. Any failure happens in this step <productname>PostgreSQL</productname> changes
+    over rollback, then rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign Transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Foreign Transaction Resolution</title>
+
+   <para>
+    Foreign transaction resolutions are performed by foreign transaction resolver process.
+    They commit all prepared transaction on foreign servers if the coordinator received
+    an agreement message from all foreign server during the first step. On the other hand,
+    if any foreign server failed to prepare the transaction, it rollbacks all prepared
+    transactions.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions on one
+    database of the coordinator side. On failure during resolution, they retries to
+    resolve after <varname>foreign_transaction_resolution_interval</varname>.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>In-doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit or rollback
+    using two-phase commit protocol. However, if the second phase fails for whatever reason
+    the transaction becomes in-doubt. The transactions becomes in-doubt in the following
+    situations:
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server crashes during atomic commit
+      operation.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server got a cancellation by user during
+      atomic commit.
+     </para>
+    </listitem>
+   </itemizedlist>
+
+   In-doubt transactions are automatically handled by foreign transaction resolver process
+   when there is no online transaction requesting resolutions.
+   <function>pg_resolve_fdw_xact</function> provides a way to resolve transactions on foreign
+   servers manually that participated the distributed transaction manually.
+   </para>
+
+   <para>
+    The atomic commit operation is crash-safe. The being processed foreign transactions at
+    crash are processed by a foreign transaction resolvers as an in-doubt transaction
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-fdwxact-resolver-view"><literal>pg_stat_fdwxact_resolver</literal></link>
+    view. This view contains one row for every foreign Transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd..3da13c9 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1390,6 +1390,118 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>atomic commit</firstterm>
+     (as described in <xref linkend="fdw-transaction-managements"/>), it must call the
+     registrasaction function <function>FdwXactRegisterForeignTransaction</function>
+     and provide the following callback functions:
+    </para>
+
+    <para>
+<programlisting>
+bool
+PrepareForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if atomic commit is required.
+    Returning <literal>true</literal> means that preparing the foreign
+    transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Commit the not-prepared transaction on the foreign server.
+    This function is called at the pre-commit phase of local
+    transaction if atomic commit is not required. The atomic
+    commit is not required either when we modified data on
+    only one server including the local server or when userdoesn't
+    request atomic commit by <xref linkend="guc-foreign-twophase-commit"/>.
+    Returning <literal>true</literal> means that commit the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Rollback a not-prepared transaction on the foreign server.
+    This function is called at the end of local transaction after
+    rollbacked locally either when user requested rollback or when
+    any error occurs during the transaction. This function could
+    be called recursively if any error occurs during rollback the
+    foreign transaction for whatever reason. You need to track
+    recursion and prevent this function from being called infinitely.
+    Returning <literal>true</literal> means that rollback the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+ResolvePreparedForeignTransaction(FdwXactResolveState *state,
+                                  bool is_commit);
+</programlisting>
+    Commit or rollback the prepared transaction on the foreign server.
+    When <varname>is_commit</varname> is true, it indicates that the foreign
+    transaction should be committed. Otherwise the foreign transaction should
+    be aborted.
+    This function normally is called by the foreign transaction resolver
+    process but can also be called by <function>pg_resovle_fdw_xacts</function>
+    function. In the resolver process, this function is called either
+    when a backend requests the resolver process to resolve a distributed
+    transaction after prepared, or when a database has dangling
+    transactions. Returning <literal>true</literal> means that resolving
+    the foreign transaction got successful.
+    In abort case, please note that the prepared transaction identified
+    by <varname>state->fdwxact_id</varname> might not exist on the foreign
+    server. If you failed to resolve the foreign transaction due to undefined
+    object error (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) you should
+   regards it as success and return <literal>true</literal>.
+    </para>
+    <para>
+<programlisting>
+bool
+IsTwoPhaseCommitEnabled(Oid serverid);
+</programlisting>
+    Return <literal>true</literal> if the foreign server identified by
+    <literal>serverid</literal> is capable of two-phase commit protocol.
+    This function is called at commit time once.
+    Return <literal>false</literal> indicates that the current transaction
+    cannot use atomic commit even if atomic commit is requested by user.
+    </para>
+
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1835,4 +1947,95 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+    <title>Transaction managements for Foreign Data Wrappers</title>
+
+    <para>
+     <productname>PostgreSQL</productname> foreign transaction manager
+     allows FDWs to read and write data on foreign server within a transaction while
+     maintaining atomicity of the foreign data (aka atomic commit). Using
+     atomic commit, it guarantees that a distributed transaction is committed
+     or rollbacked on all participants foreign
+     server.  To achieve atomic commit, <productname>PostgreSQL</productname>
+     employees two-phase commit protocol, which is a type of atomic commitment
+     protocol. Every FDW that wish to support atomic commit
+     is required to support the transaction management callback routines:
+     <function>PrepareForeignTransaction</function>,
+     <function>CommitForeignTransaction</function>,
+     <function>RollbackForeignTransaction</function>,
+     <function>ResolveForeignTransaction</function>,
+     <function>IsTwoPhaseCommitEnabled</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+     Tranasction of foreign server that supports these callback routines is
+     managed by <productname>PostgreSQL</productname>'s distributed  transaction
+     manager. Each transaction management callbacks are called at appropriate time.
+    </para>
+
+    <para>
+     The information in <literal>FdwXactState</literal> can be used to identify
+     foreign servers. <literal>state-&gt;fdw_state</literal> is a <type>void</type>
+     pointer that is available for FDW transaction functions to store Information
+     relevant to the particular foreign server.  It is useful for passing
+     information forward from <function>PrepareForeignTransaction</function> and/or
+     <function>CommitTransaciton</function> to
+     <function>RollbackForeignTransaction</function>, there by avoiding recalculation.
+     Note that since <function>ResolveForeignTransaction</function> is called
+     idependently from these callback routines, the information is not passed to
+     <function>ResolverForeignTransaction</function>.
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling <function>PrepareForeignTransaction</function>
+     if two-phase commit protocol is required. Two-phase commit is required only if
+     the transaction modified data on more than one servers including the local
+     server itself and user requests atomic commit. <productname>PostgreSQL</productname>
+     can commit locally and go to the next step if and only if all preparing foreign
+     transactions got successful. If two-phase commit is not required, the foreign
+     transaction manager commits each transaction calling
+     <function>CommitForeignTransaction</function> and then commit locally.
+     If any failure happens or user requests to cancel during the pre-commit phase
+     the distributed Transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function> for not-prepared foreign
+     servers, and then rollback locally. The prepared foreign servers are rollbacked
+     by a foreign transaction resolver process.
+    </para>
+
+    <para>
+     Once committed locally, the distributed transaction must be committed. The
+     prepared foreign transaction will be committed by foreign transaction resolver
+     process.
+    </para>
+
+    <para>
+     When two-phase commit is required, after committed locally, the transaction
+     commit will wait for all prepared foreign transaction to be committed before
+     completetion. One foreign transaction resolver process is responsible for
+     foreign transaction resolution on a database.
+     <function>ResolverForeignTransaction</function> is called by the foreign
+     transaction resolver process when resolution.
+     <function>ResolveForeignTransaction</function> is also be called
+     when user executes <function>pg_resovle_fdw_xact</function> function.
+    </para>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a..38d6fcb 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5193df3..8bb251e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20806,6 +20806,57 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-fdw-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_fdw_xacts</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_fdw_xacts</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function search for foreign transaction
+        matching the arguments and resolves then. This function won't resolve
+        a foreign transaction which is in progress, or one that is locked by some
+        other backend.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_fdw_xact</function>
+        except it remove foreign transaction entry without resolving.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa..6b2aa6f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_fdw_xact_resolver</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-fdwxact-resolver-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1194,6 +1202,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalLauncherMain</literal></entry>
          <entry>Waiting in main loop of logical launcher process.</entry>
         </row>
@@ -1405,6 +1421,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2214,6 +2234,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-fdwxact-resolver-view" xreflabel="pg_stat_fdw_xact_resolver">
+   <title><structname>pg_stat_fdw_xact_resolver</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603..c10e21f 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -164,6 +164,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index bd93a6a..4a1ebdc 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  tablesample transam
+			  tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..9ddbb14
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o fdwxact_resolver.o fdwxact_launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..1f270db
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2678 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL distributed transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * When a foreign data wrapper starts transaction on a foreign server that
+ * is capable of two-phase commit protocol, foreign data wrappers registers
+ * the foreign transaction using function FdwXactRegisterForeignTransaction()
+ * in order to participate to a group for atomic commit. Participants are
+ * identified by oid of foreign server and user. When the foreign transaction
+ * begins to modify data the executor marks it as modified using
+ * FdwXactMarkForeignTransactionModified().
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. After committing or rolling back locally, we
+ * notify the resolver process and tell it to commit or roll back those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail when an ERROR
+ * after already committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * waiters each time we receive a request. We have two queues: the active
+ * queue and the retry queue. The backend is inserted to the active queue at
+ * first, and then it is moved to the retry queue by the resolver process if
+ * the resolution fails. The backends in the retry queue are processed at
+ * interval of foreign_transaction_resolution_retry_interval.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or more
+ * servers including itself. In other case, all foreign transactions are
+ * committed during pre-commit.
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. dangling
+ * transaction). Dangling transactions are processed by the resolve process
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * 	* On PREPARE redo we add the foreign transaction to FdwXactCtl->fdw_xacts.
+ *	  We set fdw_xact->inredo to true for such entries.
+ *	* On Checkpoint redo, we iterate through FdwXactCtl->fdw_xacts entries that
+ *	  have set fdw_xact->inredo true and are behind the redo_horizon. We save
+ *    them to disk and then set fdw_xact->ondisk to true.
+ *	* On COMMIT and ABORT we delete the entry from FdwXactCtl->fdw_xacts.
+ *	  If fdw_xact->ondisk is true, we delete the corresponding file from
+ *	  the disk as well.
+ *  * RecoverFdwXacts loads all foreign transaction entries from disk into
+ *    memory at server startup.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Is atomic commit requested by user? */
+#define IsAtomicCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+#define IsAtomicCommitRequested() \
+	(IsAtomicCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+#define FDW_XACT_ACTION_COMMIT	 		0x01
+#define FDW_XACT_ACTION_TWOPHASE_COMMIT 0x02
+
+/* Structure to bundle the foreign transaction participant */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in global entry. NULL if
+	 * this foreign transaction is registered but not inserted
+	 * yet.
+	 */
+	FdwXact		fdw_xact;
+	char		*fdw_xact_id;
+
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	bool		modified;					/* true if modified the data on server */
+	bool		twophase_commit_enabled;	/* true if the server can execute
+											 * two-phase commit protocol */
+	void			*fdw_state;				/* fdw-private state */
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact;
+	CommitForeignTransaction_function	commit_foreign_xact;
+	RollbackForeignTransaction_function	rollback_foreign_xact;
+	IsTwoPhaseCommitEnabled_function	is_twophase_commit_enabled;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit.
+ * This list has only foreign servers that support atomic commit FDW
+ * API regardless of their configuration.
+ */
+static List *FdwXactAtomicCommitParticipants = NIL;
+static bool FdwXactAtomicCommitReady = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDW_XACTS_DIR "pg_fdw_xact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDW_XACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDW_XACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+static void FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, bool modified);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactCommitForeignTransaction(FdwXactParticipant *fdw_part);
+static bool FdwXactResolveForeignTransaction(FdwXactState *state, FdwXact fdwxact,
+											 int elevel);
+static void FdwXactComputeRequiredXmin(void);
+static bool FdwXactAtomicCommitRequired(void);
+static void FdwXactQueueInsert(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid, bool give_warnings);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool give_warnings);
+static List *get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						   bool need_lock);
+static FdwXact get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								bool need_lock);
+static FdwXact get_all_fdw_xacts(int *length);
+static FdwXact insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							   Oid umid, char *fdw_xact_id);
+static char *generate_fdw_xact_identifier(TransactionId xid, Oid serverid, Oid userid);
+static void remove_fdw_xact(FdwXact fdw_xact);
+static FdwXactState *create_fdw_xact_state(void);
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/*
+ * Remember accessed foreign server. This function is called by executor when
+ * it begins to access foreign server. If FDW of the foreign server supports
+ * atomic commit API, it is registered as a transaction participant of distributed
+ * transaction.
+ */
+void
+FdwXactMarkForeignServerAccessed(Relation rel, int flags, bool modified)
+{
+	FdwRoutine			*fdwroutine;
+	ListCell   			*lc;
+	Oid					serverid;
+	Oid					userid;
+
+	/* Quick return if atomic commit is not enabled */
+	if (!IsAtomicCommitEnabled())
+		return;
+
+	/* Do nothing in EXPLAIN (no ANALYZE) case */
+	if (flags && EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	serverid = GetForeignServerIdByRelId(RelationGetRelid(rel));
+	fdwroutine  = GetFdwRoutineByRelId(RelationGetRelid(rel));
+
+	/*
+	 * If the being modified foreign server doesn't have the atomic commit API
+	 * we don't manage the foreign transaction in the distributed transaction
+	 * manager.
+	 */
+	if (fdwroutine->IsTwoPhaseCommitEnabled == NULL)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		return;
+	}
+
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	foreach(lc, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	FdwXactRegisterForeignTransaction(serverid, userid, modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign server identified by given server id must support atomic
+ * commit APIs. Registered foreign transaction are managed by foreign
+ * transaction manager until the end of the transaction.
+ */
+static void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	MemoryContext		old_ctx;
+	char				*fdwxact_id;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	/*
+	 * Participants information is needed at the end of a transaction, where
+	 * system cache are not available. Save it in TopTransactionContext
+	 * beforehand so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	/* Make sure that the FDW has transaction handlers */
+	if (!fdw_routine->PrepareForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function provided for preparing foreign transaction for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to commit a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->RollbackForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to rollback a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+
+	/* Generate an unique identifier */
+	if (fdw_routine->GetPrepareId)
+	{
+		char *id;
+		int fdwxact_id_len = 0;
+
+		id = fdw_routine->GetPrepareId(GetTopTransactionId(),
+											   foreign_server->serverid,
+											   user_mapping->userid,
+											   &fdwxact_id_len);
+
+		if (!id)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 (errmsg("foreign transaction identifier is not provided"))));
+
+		/* Check length of foreign transaction identifier */
+		id[fdwxact_id_len] = '\0';
+		if (fdwxact_id_len > NAMEDATALEN)
+			ereport(ERROR,
+					(errcode(ERRCODE_NAME_TOO_LONG),
+					 errmsg("foreign transaction identifer \"%s\" is too long",
+							id),
+					 errdetail("foreign transaction identifier must be less than %d characters.",
+							   NAMEDATALEN)));
+
+		fdwxact_id = pstrdup(id);
+	}
+	else
+		fdwxact_id = generate_fdw_xact_identifier(GetTopTransactionId(),
+												  serverid, userid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdw_xact_id = fdwxact_id;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdw_xact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->twophase_commit_enabled = true; /* by default, will be changed at pre-commit phase */
+	fdw_part->fdw_state = NULL;
+	fdw_part->prepare_foreign_xact = fdw_routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact = fdw_routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact = fdw_routine->RollbackForeignTransaction;
+	fdw_part->is_twophase_commit_enabled = fdw_routine->IsTwoPhaseCommitEnabled;
+
+	/* Add this foreign transaction to the participants list */
+	FdwXactAtomicCommitParticipants = lappend(FdwXactAtomicCommitParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+
+	return;
+}
+
+/*
+ * FdwXactShmemSize
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdw_xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * FdwXactShmemInit
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdw_xacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->freeFdwXacts = NULL;
+		FdwXactCtl->numFdwXacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdw_xacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdw_xacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdw_xacts[cnt].status = FDW_XACT_INITIAL;
+			fdw_xacts[cnt].fxact_free_next = FdwXactCtl->freeFdwXacts;
+			FdwXactCtl->freeFdwXacts = &fdw_xacts[cnt];
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * PreCommit_FdwXacts
+ *
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_atomic_commit;
+	ListCell	*lc;
+	ListCell	*next;
+	ListCell	*prev = NULL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsAtomicCommitRequested())
+		return;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	need_atomic_commit = FdwXactAtomicCommitRequired();
+
+	/*
+	 * If 'require' case, we require all modified server have to be capable of
+	 * two-phase commit protocol.
+	 */
+	if (need_atomic_commit &&
+		foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Commit transactions on foreign servers.
+	 *
+	 * Committed transactions are removed from FdwXactAtomicCommitParticipants
+	 * so that the later preparation can process only servers that requires to be commit
+	 * using two-phase commit protocol.
+	 */
+	for (lc = list_head(FdwXactAtomicCommitParticipants); lc != NULL; lc = next)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool can_commit = false;
+
+		next = lnext(lc);
+
+		if (!need_atomic_commit || !fdw_part->modified)
+		{
+			/*
+			 * We can commit not-modified servers and when the atomic commit is not
+			 * required.
+			 */
+			can_commit = true;
+		}
+		else if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+				 !fdw_part->twophase_commit_enabled)
+		{
+			/* Also in 'prefer' case, non-2pc-capable servers can be committed */
+			can_commit = true;
+		}
+
+		if (can_commit)
+		{
+			/* Commit the foreign transaction */
+			FdwXactCommitForeignTransaction(fdw_part);
+
+			/* Delete it from the participant list */
+			FdwXactAtomicCommitParticipants =
+				list_delete_cell(FdwXactAtomicCommitParticipants, lc, prev);
+		}
+
+		prev = lc;
+	}
+
+	/*
+	 * If only one participant of all participants is modified, we can commit it.
+	 * This can avoid to use two-phase commit for only one server in the 'prefer' case
+	 * where the transaction has one 2pc-capable modified server and some modified
+	 * servers.
+	 */
+	if (list_length(FdwXactAtomicCommitParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
+		FdwXactCommitForeignTransaction(linitial(FdwXactAtomicCommitParticipants));
+		list_free(FdwXactAtomicCommitParticipants);
+		return;
+	}
+
+	FdwXactPrepareForeignTransactions();
+	/* keep FdwXactparticipantsForAC until the end of transaction */
+}
+
+/*
+ * FdwXactPrepareForeignTransactions
+ *
+ * Prepare all foreign transaction participants.  This function creates a prepared
+ * participants chain each time when we prepared a foreign transaction. The prepared
+ * participants chain is used to access all participants of distributed transaction
+ * quickly. If any one of them fails to prepare, we change over aborts.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	FdwXactState *state;
+	ListCell   *lcell;
+	FdwXact		prev_fdwxact = NULL;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	state = create_fdw_xact_state();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXact		fdwxact;
+
+		/*
+		 * Insert the foreign transaction entry. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(GetTopTransactionId(), fdw_part);
+
+		state->serverid = fdw_part->server->serverid;
+		state->userid = fdw_part->usermapping->userid;
+		state->umid = fdw_part->usermapping->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		/*
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal). During abort processing,
+		 * we might try to resolve a never-prepared transaction, and get an error.
+		 * This is fine as long as the FDW provides us unique prepared transaction
+		 * identifiers.
+		 */
+		if (!fdw_part->prepare_foreign_xact(state))
+		{
+			/* Failed to prepare, change over aborts */
+			ereport(ERROR,
+					(errmsg("could not prepare transaction on foreign server %s",
+							fdw_part->server->servername)));
+		}
+
+		/* Keep fdw_state until end of transaction */
+		fdw_part->fdw_state = state->fdw_state;
+
+		/* Preparation is success, update its status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdw_part->fdw_xact->status = FDW_XACT_PREPARED;
+		fdw_part->fdw_xact = fdwxact;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Create a prepared participants chain, which is link-ed FdwXact entries
+		 * involving with this transaction.
+		 */
+		if (prev_fdwxact)
+		{
+			/* Append others to the tail */
+			Assert(fdwxact->fxact_next == NULL);
+			prev_fdwxact->fxact_next = fdwxact;
+		}
+	}
+}
+
+/*
+ * Commit the given foreign transaction.
+ */
+void
+FdwXactCommitForeignTransaction(FdwXactParticipant *fdw_part)
+{
+	FdwXactState *state;
+
+	state = create_fdw_xact_state();
+	state->serverid = fdw_part->server->serverid;
+	state->userid = fdw_part->usermapping->userid;
+	state->umid = fdw_part->usermapping->umid;
+	fdw_part->fdw_state = (void *) state;
+
+	if (!fdw_part->commit_foreign_xact(state))
+		ereport(ERROR,
+				(errmsg("could not commit foreign transaction on server %s",
+						fdw_part->server->servername)));
+}
+
+/*
+ * FdwXactInsertFdwXactEntry
+ *
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and will
+ * be persisted to the disk under pg_fdw_xact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fxact;
+	FdwXactOnDiskData	*fxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact = insert_fdw_xact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdw_xact_id);
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->held_by = MyBackendId;
+	fdw_part->fdw_xact = fxact;
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdw_xact_id);
+	data_len = data_len + strlen(fdw_part->fdw_xact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fxact_file_data->dbid = MyDatabaseId;
+	fxact_file_data->local_xid = xid;
+	fxact_file_data->serverid = fdw_part->server->serverid;
+	fxact_file_data->userid = fdw_part->usermapping->userid;
+	fxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fxact_file_data->fdw_xact_id, fdw_part->fdw_xact_id,
+		   strlen(fdw_part->fdw_xact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fxact_file_data, data_len);
+	fxact->insert_end_lsn = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_INSERT);
+	XLogFlush(fxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact->valid = true;
+	LWLockRelease(FdwXactLock);
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fxact_file_data);
+	return fxact;
+}
+
+/*
+ * insert_fdw_xact
+ *
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdw_xact_id)
+{
+	int i;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		fxact = FdwXactCtl->fdw_xacts[i];
+		if (fxact->dbid == dbid &&
+			fxact->local_xid == xid &&
+			fxact->serverid == serverid &&
+			fxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->freeFdwXacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fxact = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fxact->fxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->numFdwXacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts++] = fxact;
+
+	fxact->held_by = InvalidBackendId;
+	fxact->dbid = dbid;
+	fxact->local_xid = xid;
+	fxact->serverid = serverid;
+	fxact->userid = userid;
+	fxact->umid = umid;
+	fxact->insert_start_lsn = InvalidXLogRecPtr;
+	fxact->insert_end_lsn = InvalidXLogRecPtr;
+	fxact->valid = false;
+	fxact->ondisk = false;
+	fxact->inredo = false;
+	memcpy(fxact->fdw_xact_id, fdw_xact_id, strlen(fdw_xact_id) + 1);
+
+	return fxact;
+}
+
+/*
+ * remove_fdw_xact
+ *
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdw_xact(FdwXact fdw_xact)
+{
+	int			cnt;
+
+	Assert(fdw_xact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		if (FdwXactCtl->fdw_xacts[cnt] == fdw_xact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (cnt >= FdwXactCtl->numFdwXacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdw_xact->local_xid, fdw_xact->serverid, fdw_xact->userid)));
+
+	/* Remove the entry from active array */
+	FdwXactCtl->numFdwXacts--;
+	FdwXactCtl->fdw_xacts[cnt] = FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts];
+
+	/* Put it back into free list */
+	fdw_xact->fxact_free_next = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fdw_xact;
+
+	/* Reset informations */
+	fdw_xact->status = FDW_XACT_INITIAL;
+	fdw_xact->held_by = InvalidBackendId;
+	fdw_xact->fxact_next = NULL;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdw_xact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdw_xact->serverid;
+		record.dbid = fdw_xact->dbid;
+		record.xid = fdw_xact->local_xid;
+		record.userid = fdw_xact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdw_xact_remove));
+		recptr = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if we require atomic commit.
+ * It is required if the transaction modified data on two or more servers including
+ * local node itself. This function also checks for each server if two-phase commit
+ * is enabled or not.
+ */
+static bool
+FdwXactAtomicCommitRequired(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsAtomicCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		/* Check if the foreign server is capable of two-phase commit protocol */
+		if (fdw_part->is_twophase_commit_enabled(fdw_part->server->serverid))
+			fdw_part->twophase_commit_enabled = true;
+		else if (fdw_part->modified)
+			MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/* Atomic commit is required if we modified data on two or more participants */
+	if (nserverswritten <= 1)
+		return false;
+
+	FdwXactAtomicCommitReady = true;
+	return true;
+}
+
+bool
+FdwXactIsAtomicCommitReady(void)
+{
+	return FdwXactAtomicCommitReady;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * ForgetAllFdwXactParticipants
+ *
+ * Reset all the foreign transaction entries that this backend registered.
+ * If the foreign transaction has the corresponding FdwXact entry, resetting
+ * the held_by field means to leave that entry in unresolved state. If we
+ * leaves any entries, we update the oldest xmin of unresolved transaction
+ * so that transaction status of dangling transaction are not truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(cell, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+
+		/* Skip if didn't register FdwXact entry yet */
+		if (fdw_part->fdw_xact == NULL)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in
+		 * FdwXactAtomicCommitParticipants could be used by other backend before we
+		 * forget in case where the resolver process removes the FdwXact entry
+		 * and other backend reuses it before we forget. So we need to check
+		 * if the entries are still associated with the transaction.
+		 */
+		if (fdw_part->fdw_xact->held_by == MyBackendId)
+		{
+			fdw_part->fdw_xact->held_by = InvalidBackendId;
+			n_lefts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Update the oldest local transaction of unresolved distributed
+	 * transaction if we leaved any FdwXact entries.
+	 */
+	if (n_lefts > 0)
+		FdwXactComputeRequiredXmin();
+
+	FdwXactAtomicCommitParticipants = NIL;
+}
+
+/*
+ * AtProcExit_FdwXact
+ *
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDW_XACT_NOT_WAITING and then change
+ * that state to FDW_XACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDW_XACT_WAIT_COMPLETE once foreign transactions are resolved.
+ * This backend then resets its state to FDW_XACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue and changes the state to FDW_XACT_WAITING_RETRY.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+	ListCell	*lc;
+	List		*fdwxact_participants = NIL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsAtomicCommitRequested())
+		return;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDW_XACT_NOT_WAITING);
+
+	if (FdwXactAtomicCommitParticipants != NIL)
+	{
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * we've prepared just before, use the participants list.
+		 */
+		Assert(MyPgXact->xid == wait_xid);
+		fdwxact_participants = FdwXactAtomicCommitParticipants;
+	}
+	else
+	{
+		/*
+		 * Get participants list from the global array. This is required (1)
+		 * when we're waiting for foreign transactions to be resolved that
+		 * is part of a local prepared transaction that is marked as prepared
+		 * during running, or (2) when we resolve the PREPARE'd distributed
+		 * transaction after restart.
+		 */
+		fdwxact_participants = get_fdw_xacts(MyDatabaseId, wait_xid,
+											 InvalidOid, InvalidOid, true);
+	}
+
+	/* Exit if we found no foreign transaction to resolve */
+	if (fdwxact_participants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(lc, fdwxact_participants)
+	{
+		FdwXact fdw_xact = (FdwXact) lfirst(lc);
+
+		/* Don't overwrite status if fate has been determined */
+		if (fdw_xact->status == FDW_XACT_PREPARED)
+			fdw_xact->status = (is_commit ?
+								FDW_XACT_COMMITTING_PREPARED :
+								FDW_XACT_ABORTING_PREPARED);
+	}
+
+	/* Set backend status and enqueue itself to the active queue*/
+	MyProc->fdwXactState = FDW_XACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	FdwXactQueueInsert();
+	LWLockRelease(FdwXactLock);
+
+	/* Launch a resolver process if not yet, or wake it up */
+	fdwxact_maybe_launch_resolver(false);
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDW_XACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDW_XACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDW_XACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+
+	/*
+	 * Forget the list of locked entries, also means that the entries
+	 * that could not resolved are remained as dangling transactions.
+	 */
+	ForgetAllFdwXactParticipants();
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Acquire FdwXactLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Insert MyProc into the tail of FdwXactActiveQueue.
+ */
+static void
+FdwXactQueueInsert(void)
+{
+	SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactActiveQueue),
+						 &(MyProc->fdwXactLinks));
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Resolve one distributed transaction. The target distributed transaction
+ * is fetched from either the active queue or the retry queue and its participants
+ * are fetched from either the global array.
+ *
+ * Release the waiter and return true if we resolved the all of the foreign
+ * transaction participants. On failure, we move the FdwXactLinks entry to the
+ * retry queue from the active queue, and raise an error and exit.
+ */
+bool
+FdwXactResolveDistributedTransaction(Oid dbid, bool is_active)
+{
+	FdwXactState	*state;
+	ListCell		*lc;
+	ListCell		*next;
+	PGPROC			*waiter = NULL;
+	List			*participants;
+	SHM_QUEUE		*target_queue;
+
+	if (is_active)
+		target_queue = &(FdwXactRslvCtl->FdwXactActiveQueue);
+	else
+		target_queue = &(FdwXactRslvCtl->FdwXactRetryQueue);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Fetch a waiter from beginning of the queue */
+	while ((waiter = (PGPROC *) SHMQueueNext(target_queue, target_queue,
+											 offsetof(PGPROC, fdwXactLinks))) != NULL)
+	{
+		/* Found a waiter */
+		if (waiter->databaseId == dbid)
+			break;
+	}
+
+	/* If no waiter, there is no job */
+	if (!waiter)
+	{
+		LWLockRelease(FdwXactLock);
+		return false;
+	}
+
+	Assert(TransactionIdIsValid(waiter->fdwXactWaitXid));
+
+	state = create_fdw_xact_state();
+	participants = get_fdw_xacts(dbid, waiter->fdwXactWaitXid, InvalidOid,
+								 InvalidOid, false);
+	LWLockRelease(FdwXactLock);
+
+	/* Resolve all foreign transactions one by one */
+	for (lc = list_head(participants); lc != NULL; lc = next)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(lc);
+
+		CHECK_FOR_INTERRUPTS();
+
+		next = lnext(lc);
+
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		PG_TRY();
+		{
+			FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+		}
+		PG_CATCH();
+		{
+			/* Re-insert the waiter to the retry queue */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDW_XACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactRetryQueue),
+									 &(waiter->fdwXactLinks));
+				waiter->fdwXactState = FDW_XACT_WAITING_RETRY;
+			}
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved a foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+	}
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDW_XACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc xid %u", wait_xid);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	return true;
+}
+
+/*
+ * Resolve all dangling foreign transactions on the given database. Get
+ * all dangling foreign transactions from shmem global array and resolve
+ * them one by one.
+ */
+void
+FdwXactResolveAllDanglingTransactions(Oid dbid)
+{
+	List		*dangling_fdwxacts = NIL;
+	ListCell	*cell;
+	bool		n_resolved = 0;
+	int			i;
+
+	Assert(OidIsValid(dbid));
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/*
+	 * Walk over the global array to make the list of dangling transactions
+	 * of which corresponding local transaction is on the given database.
+	 */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fxact = FdwXactCtl->fdw_xacts[i];
+
+		/*
+		 * Append the fdwxact entry on the given database to the list if
+		 * it's handled by nobody and the corresponding local transaction
+		 * is not part of the prepared transaction.
+		 */
+		if (fxact->dbid == dbid &&
+			fxact->held_by == InvalidBackendId &&
+			!TwoPhaseExists(fxact->local_xid))
+			dangling_fdwxacts = lappend(dangling_fdwxacts, fxact);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/* Return if there is no foreign transaction we need to resolve */
+	if (dangling_fdwxacts == NIL)
+		return;
+
+	foreach(cell, dangling_fdwxacts)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(cell);
+		FdwXactState *state;
+
+		state = create_fdw_xact_state();
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+
+		n_resolved++;
+	}
+
+	list_free(dangling_fdwxacts);
+
+	elog(DEBUG2, "resolved %d dangling foreign xacts", n_resolved);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ *
+ * In commit case, we have already prepared transactions on the foreign
+ * servers during pre-commit. And that prepared transactions will be
+ * resolved by the resolver process. So we don't do anything about the
+ * foreign transaction.
+ *
+ * In abort case, user requested rollback or we changed over rollback
+ * due to error during commit. To close current foreign transaction anyway
+ * we call rollback API to every foreign transaction. If we raised an error
+ * during preparing and came to here, it's possible that some entries of
+ * FdwXactParticipants already registered its FdwXact entry. If there is
+ * we leave them as dangling transaction and ask the resolver process to
+ * process them.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		int left_fdwxacts = 0;
+		FdwXactState *state = create_fdw_xact_state();
+
+		foreach (lcell, FdwXactAtomicCommitParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * Count FdwXact entries that we registered to shared memory array
+			 * in this transaction.
+			 */
+			if (fdw_part->fdw_xact)
+			{
+				/*
+				 * The status of foreign transaction must be either preparing
+				 * or prepared. In any case, since we have registered FdwXact
+				 * entry we leave them to the resolver process. For the preparing
+				 * state, since the foreign transaction might not close yet we
+				 * fall through and call rollback API. For the prepared state,
+				 * since the foreign transaction has closed we don't need to do
+				 * anything.
+				 */
+				Assert(fdw_part->fdw_xact->status == FDW_XACT_PREPARING ||
+					   fdw_part->fdw_xact->status == FDW_XACT_PREPARED);
+
+				left_fdwxacts++;
+				if (fdw_part->fdw_xact->status == FDW_XACT_PREPARED)
+					continue;
+			}
+
+			state->serverid = fdw_part->server->serverid;
+			state->userid = fdw_part->usermapping->userid;
+			state->umid = fdw_part->usermapping->umid;
+			state->fdw_state = fdw_part->fdw_state;
+
+			/*
+			 * Rollback all current foreign transaction. Since we're rollbacking
+			 * the transaction it's too late even if we raise an error here.
+			 * So we log it as warning.
+			 */
+			if (!fdw_part->rollback_foreign_xact(state))
+				ereport(WARNING,
+						(errmsg("could not abort transaction on server \"%s\"",
+								fdw_part->server->servername)));
+		}
+
+		/* If we left some FdwXact entries, ask the resolver process */
+		if (left_fdwxacts > 0)
+		{
+			ereport(WARNING,
+					(errmsg("might have left %u foreign transactions in in-doubt status",
+							left_fdwxacts)));
+			fdwxact_maybe_launch_resolver(true);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+	FdwXactAtomicCommitReady = false;
+}
+
+/*
+ * AtPrepare_FdwXacts
+ *
+ * If there are foreign servers involved in the transaction, this function
+ * prepares transactions on those servers.
+ *
+ * Note that it can happen that the transaction aborts after we prepared part
+ * of participants. In this case since we can change to abort we cannot forget
+ * FdwXactAtomicCommitParticipants here. These are processed by the resolver process
+ * during aborting, or at EOXact_FdwXacts.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (!IsAtomicCommitEnabled())
+		return;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (FdwXactAtomicCommitRequired() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION),
+				 errmsg("can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * FdwXactResolveForeignTransaction
+ *
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * handler routine. The foreign transaction can be a dangling transaction
+ * that is not interested by nobody. If the fate of foreign transaction is
+ * not determined yet, it'sdetermined according to the status of corresponding
+ * local transaction.
+ *
+ * If the resolution is successful, remove the foreign transaction entry from
+ * the shared memory and also remove the corresponding on-disk file.
+ */
+static bool
+FdwXactResolveForeignTransaction(FdwXactState *state, FdwXact fdwxact,
+								 int elevel)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool		is_commit;
+	bool		ret;
+
+	Assert(fdwxact);
+
+	/*
+	 * Determine whether we commit or abort this foreign transaction.
+	 */
+	if (fdwxact->status == FDW_XACT_COMMITTING_PREPARED)
+		is_commit = true;
+	else if (fdwxact->status == FDW_XACT_ABORTING_PREPARED)
+		is_commit = false;
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	else if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_COMMITTING_PREPARED;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_ABORTING_PREPARED;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		is_commit = false;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction
+	 * state is neither committing or aborting. This should not
+	 * happen because we cannot determine to do commit or abort for
+	 * foreign transaction associated with the in-progress local
+	 * transaction.
+	 */
+	else
+		ereport(ERROR,
+				(errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid)));
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Resolve the foreign transaction */
+	Assert(fdw_routine->ResolveForeignTransaction);
+
+	ret = fdw_routine->ResolveForeignTransaction(state, is_commit);
+
+	if (!ret)
+	{
+		ereport(elevel,
+				(errmsg("could not %s a prepared foreign transaction on server \"%s\"",
+						is_commit ? "commit" : "rollback", server->servername),
+				 errdetail("local transaction id is %u, connected by user id %u",
+						   fdwxact->local_xid, fdwxact->userid)));
+	}
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdw_xact(fdwxact);
+	LWLockRelease(FdwXactLock);
+
+	return ret;
+}
+
+static FdwXactState *
+create_fdw_xact_state(void)
+{
+	FdwXactState *state;
+
+	state = palloc(sizeof(FdwXactState));
+	state->serverid = InvalidOid;
+	state->userid = InvalidOid;
+	state->umid = InvalidOid;
+	state->fdwxact_id = NULL;
+	state->fdw_state = NULL;
+
+	return state;
+}
+
+/*
+ * Return one FdwXact entry that matches to given arguments, otherwise
+ * return NULL. Since this function search FdwXact entry by unique key
+ * all arguments should be valid.
+ */
+static FdwXact
+get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				 bool need_lock)
+{
+	List	*fdw_xact_list;
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, need_lock);
+
+	/* Could not find entry */
+	if (fdw_xact_list == NIL)
+		return NULL;
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdw_xact_list) == 1);
+
+	return (FdwXact) linitial(fdw_xact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdw_xact_exists(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdw_xact_list;
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, true);
+
+	return fdw_xact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_prepared_fdw_xacts.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdw_xacts(int *length)
+{
+	List		*all_fdw_xacts;
+	ListCell	*lc;
+	FdwXact		fdw_xacts;
+	int			num_fdw_xacts = 0;
+
+	Assert(length != NULL);
+
+	/* Get all entries */
+	all_fdw_xacts = get_fdw_xacts(InvalidOid, InvalidTransactionId,
+								  InvalidOid, InvalidOid, true);
+
+	if (all_fdw_xacts == NIL)
+	{
+		*length = 0;
+		return NULL;
+	}
+
+	fdw_xacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdw_xacts));
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdw_xacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdw_xacts + num_fdw_xacts, fx,
+			   sizeof(FdwXactData));
+		num_fdw_xacts++;
+	}
+
+	*length = num_fdw_xacts;
+	list_free(all_fdw_xacts);
+
+	return fdw_xacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return
+ * NIL.
+ */
+static List*
+get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			  bool need_lock)
+{
+	int i;
+	List	*fdw_xact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact	fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool	matches = true;
+
+		/* xid */
+		if (xid != InvalidTransactionId && xid != fdw_xact->local_xid)
+			matches = false;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdw_xact->dbid != dbid)
+			matches = false;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdw_xact->serverid)
+			matches = false;
+
+		/* userid */
+		if (OidIsValid(userid) && fdw_xact->userid != userid)
+			matches = false;
+
+		/* Append it if matched */
+		if (matches)
+			fdw_xact_list = lappend(fdw_xact_list, fdw_xact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdw_xact_list;
+}
+
+/*
+ * fdw_xact_redo
+ * Apply the redo log for a foreign transaction.
+ */
+void
+fdw_xact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDW_XACT_REMOVE)
+	{
+		xl_fdw_xact_remove *record = (xl_fdw_xact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier with in the form
+ * of "fx_<random number>_<xid>_<serverid>_<userid> whose length is always
+ * less than NAMEDATALEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+generate_fdw_xact_identifier(TransactionId xid, Oid serverid, Oid userid)
+{
+	char*	fdw_xact_id;
+
+	fdw_xact_id = (char *)palloc0(FDW_XACT_ID_MAX_LEN * sizeof(char));
+
+	snprintf(fdw_xact_id, FDW_XACT_ID_MAX_LEN, "%s_%ld_%u_%d_%d",
+			 "fx", Abs(random()), xid, serverid, userid);
+	fdw_xact_id[strlen(fdw_xact_id)] = '\0';
+
+	return fdw_xact_id;
+}
+
+/*
+ * CheckPointFdwXact
+ *
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * In order to avoid disk I/O while holding a light weight lock, the function
+ * first collects the files which need to be synced under FdwXactLock and then
+ * syncs them after releasing the lock. This approach creates a race condition:
+ * after releasing the lock, and before syncing a file, the corresponding
+ * foreign transaction entry and hence the file might get removed. The function
+ * checks whether that's true and ignores the error if so.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdw_xacts = 0;
+
+	/* Quick get-away, before taking lock */
+	if (max_prepared_foreign_xacts <= 0)
+		return;
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Another quick, before we allocate memory */
+	if (FdwXactCtl->numFdwXacts <= 0)
+	{
+		LWLockRelease(FdwXactLock);
+		return;
+	}
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		FdwXact		fxact = FdwXactCtl->fdw_xacts[cnt];
+
+		if ((fxact->valid || fxact->inredo) &&
+			!fxact->ondisk &&
+			fxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fxact->dbid, fxact->local_xid,
+								fxact->serverid, fxact->userid,
+								buf, len);
+			fxact->ondisk = true;
+			fxact->insert_start_lsn = InvalidXLogRecPtr;
+			fxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdw_xacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDW_XACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdw_xacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdw_xacts,
+							 serialized_fdw_xacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDW_XACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDW_XACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * ProcessFdwXactBuffer
+ *
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid = ShmemVariableCache->nextXid;
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(insert_start_lsn != InvalidXLogRecPtr);
+
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid, true);
+		if (buf == NULL)
+		{
+			ereport(WARNING,
+					(errmsg("removing corrupt fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+			return NULL;
+		}
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return thecontents in
+ * a structure allocated in-memory. Otherwise return NULL. The structure can
+ * be later freed by the caller.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				bool give_warnings)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+	{
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not stat FDW transaction state file \"%s\": %m",
+							path)));
+		return NULL;
+	}
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdw_xact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+	{
+		CloseTransientFile(fd);
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+		return NULL;
+	}
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+	{
+		CloseTransientFile(fd);
+		return NULL;
+	}
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_READ);
+	if (read(fd, buf, stat.st_size) != stat.st_size)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read FDW transaction state file \"%s\": %m",
+					  path)));
+		return NULL;
+	}
+
+	pgstat_report_wait_end();
+	CloseTransientFile(fd);
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+	{
+		pfree(buf);
+		return NULL;
+	}
+
+	/* Check if the contents is an expected data */
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fxact_file_data->dbid  != dbid ||
+		fxact_file_data->serverid != serverid ||
+		fxact_file_data->userid != userid ||
+		fxact_file_data->local_xid != xid)
+	{
+		ereport(WARNING,
+			(errmsg("invalid foreign transaction state file \"%s\"",
+					path)));
+		CloseTransientFile(fd);
+		pfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/*
+ * PrescanFdwXacts
+ *
+ * Scan the all foreign transactions directory for oldest active transaction.
+ * This is run during database startup, after we completed reading WAL.
+ * ShmemVariableCache->nextXid has been set to one more than the highest XID
+ * for which evidence exists in WAL.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	TransactionId nextXid = ShmemVariableCache->nextXid;
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+		 strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			TransactionId local_xid;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/*
+			 * Remove a foreign prepared transaction file corresponding to an
+			 * XID, which is too new.
+			 */
+			if (TransactionIdFollowsOrEquals(local_xid, nextXid))
+			{
+				ereport(WARNING,
+						(errmsg("removing future foreign prepared transaction file \"%s\"",
+								clde->d_name)));
+				RemoveFdwXactFile(dbid, local_xid, serverid, userid, true);
+				continue;
+			}
+
+			if (TransactionIdPrecedesOrEquals(local_xid, oldestActiveXid))
+				oldestActiveXid = local_xid;
+		}
+	}
+
+	FreeDir(cldir);
+	return oldestActiveXid;
+}
+
+/*
+ * restoreFdwXactData
+ *
+ * Scan pg_fdw_xact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * FdwXactRedoAdd
+ *
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fxact = insert_fdw_xact(fxact_data->dbid, fxact_data->local_xid,
+							fxact_data->serverid, fxact_data->userid,
+							fxact_data->umid, fxact_data->fdw_xact_id);
+
+	/*
+	 * Set status as preparing, since we do not know the xact status
+	 * right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->insert_start_lsn = start_lsn;
+	fxact->insert_end_lsn = end_lsn;
+	fxact->inredo = true;	/* added in redo */
+	fxact->valid = false;
+	fxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * FdwXactRedoRemove
+ *
+ * Remove the corresponding fdw_xact entry from FdwXactCtl.
+ * Also remove fdw_xact file if a foreign transaction was saved
+ * via an earlier checkpoint.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdw_xact(dbid, xid, serverid, userid,
+							   false);
+
+	if (fdwxact == NULL)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdw_xact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+		/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdw_xacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_prepared_fdw_xacts(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdw_xacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdw_xacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(6, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdw_xacts = get_all_fdw_xacts(&num_fdw_xacts);
+		status->num_xacts = num_fdw_xacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdw_xact = &status->fdw_xacts[status->cur_xact++];
+		Datum		values[6];
+		bool		nulls[6];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdw_xact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdw_xact->dbid);
+		values[1] = TransactionIdGetDatum(fdw_xact->local_xid);
+		values[2] = ObjectIdGetDatum(fdw_xact->serverid);
+		values[3] = ObjectIdGetDatum(fdw_xact->userid);
+		switch (fdw_xact->status)
+		{
+			case FDW_XACT_PREPARING:
+				xact_status = "prepared";
+				break;
+			case FDW_XACT_COMMITTING_PREPARED:
+				xact_status = "committing";
+				break;
+			case FDW_XACT_ABORTING_PREPARED:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		/* should this be really interpreted by FDW */
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdw_xact->fdw_xact_id,
+															 strlen(fdw_xact->fdw_xact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXactState *state;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	bool			ret;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, true);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	usermapping = GetUserMapping(userid, serverid);
+
+	state = create_fdw_xact_state();
+	state->serverid = serverid;
+	state->userid = userid;
+	state->umid = usermapping->umid;
+
+	ret = FdwXactResolveForeignTransaction(state, fdwxact, LOG);
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, false);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	remove_fdw_xact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/access/fdwxact/fdwxact_launcher.c b/src/backend/access/fdwxact/fdwxact_launcher.c
new file mode 100644
index 0000000..39f351b
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_launcher.c
@@ -0,0 +1,641 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * There is a shared memory area where the information of resolver process
+ * is stored. Requesting of starting new resolver process by backend process
+ * is done via that shared memory area. Note that the launcher is assuming
+ * that there is no more than one starting request for a database.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launcher_sigusr2(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid, int slot);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+Datum pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS);
+
+/*
+ * Wake up the launcher process to retry launch. This is used by
+ * the resolver process is being stopped.
+ */
+void
+FdwXactLauncherWakeupToRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request resolution. This is
+ * used by the backend process.
+ */
+void
+FdwXactLauncherWakeupToRequest(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactActiveQueue));
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactRetryQueue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR1: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always try to start by the backend request.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			ResetLatch(MyLatch);
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher launch",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver worker
+ * if not running yet. A foreign transaction resolver worker is responsible
+ * for resolution of foreign transaction that are registered on a database.
+ * So if a resolver worker already is launched, we don't need to launch new
+ * one.
+ */
+void
+fdwxact_maybe_launch_resolver(bool ignore_error)
+{
+	FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->pid != InvalidPid &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * If we found the resolver for my database, we don't need to launch new
+	 * one but wake running worker up.
+	 */
+	if (found)
+	{
+		SetLatch(resolver->latch);
+
+		elog(DEBUG1, "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		return;
+	}
+
+	/* Looking for unused resolver slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	/*
+	 * However if there are no more free worker slots, inform user about it before
+	 * exiting.
+	 */
+	if (!found)
+	{
+		LWLockRelease(FdwXactResolverLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+		return;
+	}
+
+	Assert(resolver->pid == InvalidPid);
+
+	/* Found a new resolver process */
+	resolver->dbid = MyDatabaseId;
+	resolver->in_use = true;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Wake up launcher */
+	FdwXactLauncherWakeupToRequest();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' at 'slot' if given. If slot is negative value we find an unused slot.
+ * Note that caller must hold FdwXactResolverLock in exclusive mode.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid, int slot)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int launch_slot = slot;
+
+	/* If slot number is invalid, we find an unused slot */
+	if (launch_slot < 0)
+	{
+		int i;
+
+		for (i = 0; i < max_foreign_xact_resolvers; i++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+			if (resolver->in_use && resolver->dbid == dbid)
+				return;
+
+			if (!resolver->in_use)
+			{
+				launch_slot = i;
+				break;
+			}
+		}
+	}
+
+	/* No unused found */
+	if (launch_slot < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[launch_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_main_arg = Int32GetDatum(launch_slot);
+	bgw.bgw_notify_pid = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch all foreign transaction resolvers that are required by backend process
+ * but not running. Return true if we launch any resolver.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	int i, j;
+	int num_launches = 0;
+	int num_unused_slots = 0;
+	int num_dbs = 0;
+	bool launched = false;
+	Oid	*dbs_to_launch;
+	Oid	*dbs_having_worker = palloc0(sizeof(Oid) * max_foreign_xact_resolvers);
+
+	/*
+	 * Launch resolver workers on the databases that are requested
+	 * by backend processes while looking unused slots.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* Remember unused worker slots */
+		if (!resolver->in_use)
+		{
+			num_unused_slots++;
+			continue;
+		}
+
+		/* Remember databases that are having a resolve worker, fall through */
+		if (OidIsValid(resolver->dbid))
+			dbs_having_worker[num_dbs++] = resolver->dbid;
+
+		/* Launch the backend-requested worker */
+		if (resolver->in_use &&
+			OidIsValid(resolver->dbid) &&
+			resolver->pid == InvalidPid)
+		{
+			fdwxact_launch_resolver(resolver->dbid, i);
+			launched = true;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* quick exit if no unused slot */
+	if (num_unused_slots == 0)
+		return launched;
+
+	/*
+	 * Launch the stopped resolver on the database that has unresolved
+	 * foreign transaction but doesn't have any resolver. Scanning
+	 * all FdwXact entries could take time but it's harmless for the
+	 * relaunch case.
+	 */
+	dbs_to_launch = (Oid *) palloc(sizeof(Oid) * num_unused_slots);
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool found = false;
+
+		/* unused slot is full */
+		if (num_launches > num_unused_slots)
+			break;
+
+		for (j = 0; j < num_dbs; j++)
+		{
+			if (dbs_having_worker[j] == fdw_xact->dbid)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		/* Register the database if any resolvers aren't working on that */
+		if (!found)
+			dbs_to_launch[num_launches++] = fdw_xact->dbid;
+	}
+
+	/* Launch resolver process for a database at any worker slot */
+	for (i = 0; i < num_launches; i++)
+	{
+		fdwxact_launch_resolver(dbs_to_launch[i], -1);
+		launched = true;
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+
+/*
+ * Returns activity of foreign transaction resolvers, including pids, the number
+ * of tasks and the last resolution time.
+ */
+Datum
+pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/fdwxact_resolver.c b/src/backend/access/fdwxact/fdwxact_resolver.c
new file mode 100644
index 0000000..0b754da
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_resolver.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for every databases.
+ *
+ * A resolver process continues to resolve foreign transactions on a database
+ * It resolves two types of foreign transactions: on-line foreign transaction
+ * and dangling foreign transaction. The on-line foreign transaction is a
+ * foreign transaction that a concurrent backend process is waiting for
+ * resolution. The dangling transaction is a foreign transaction that corresponding
+ * distributed transaction ended up in in-doubt state. A resolver process
+ * doesn' exit as long as there is at least one unresolved foreign transaction
+ * on the database even if the timeout has come.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+
+//static MemoryContext ResolveContext = NULL;
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FdwXactRslvLoop(void);
+static long FdwXactRslvComputeSleepTime(TimestampTz now);
+static void FdwXactRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+	FdwXactLauncherWakeupToRetry();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	TIMESTAMP_NOBEGIN(MyFdwXactResolver->last_resolved_time);
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactRslvLoop(void)
+{
+	TimestampTz last_retry_time = 0;
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time;
+		bool		resolved;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve one distributed transaction */
+		StartTransactionCommand();
+		resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, true);
+		CommitTransactionCommand();
+
+		now = GetCurrentTimestamp();
+
+		/* Update my state */
+		if (resolved)
+			MyFdwXactResolver->last_resolved_time = now;
+
+		if (TimestampDifferenceExceeds(last_retry_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			StartTransactionCommand();
+			resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, false);
+			CommitTransactionCommand();
+
+			last_retry_time = GetCurrentTimestamp();
+
+			/* Update my state */
+			if (resolved)
+				MyFdwXactResolver->last_resolved_time = last_retry_time;
+		}
+
+		/* Check for fdwxact resolver timeout */
+		FdwXactRslvCheckTimeout(now);
+
+		/*
+		 * If we have resolved any distributed transaction we go the next
+		 * without both resolving dangling transaction and sleeping because
+		 * there might be other on-line transactions waiting to be resolved.
+		 */
+		if (!resolved)
+		{
+			/* Resolve dangling transactions as mush as possible */
+			StartTransactionCommand();
+			FdwXactResolveAllDanglingTransactions(MyDatabaseId);
+			CommitTransactionCommand();
+
+			sleep_time = FdwXactRslvComputeSleepTime(now);
+
+			MemoryContextResetAndDeleteChildren(resolver_ctx);
+			MemoryContextSwitchTo(TopMemoryContext);
+
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   sleep_time,
+						   WAIT_EVENT_FDW_XACT_RESOLVER_MAIN);
+
+			if (rc & WL_POSTMASTER_DEATH)
+				proc_exit(1);
+		}
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/*
+	 * Reached to the timeout. We exit if there is no more both pending on-line
+	 * transactions and dangling transactions.
+	 */
+	if (!fdw_xact_exists(InvalidTransactionId, MyDatabaseId, InvalidOid,
+						 InvalidOid))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyFdwXactResolver->dbid))));
+		CommitTransactionCommand();
+
+		fdwxact_resolver_detach();
+		proc_exit(0);
+	}
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. Return the sleep time
+ * in milliseconds, -1 means that we reached to the timeout and should exits
+ */
+static long
+FdwXactRslvComputeSleepTime(TimestampTz now)
+{
+	static TimestampTz	wakeuptime = 0;
+	long	sleeptime;
+	long	sec_to_timeout;
+	int		microsec_to_timeout;
+
+	if (now >= wakeuptime)
+		wakeuptime = TimestampTzPlusMilliseconds(now,
+												 foreign_xact_resolution_retry_interval);
+
+	/* Compute relative time until wakeup. */
+	TimestampDifference(now, wakeuptime,
+						&sec_to_timeout, &microsec_to_timeout);
+
+	sleeptime = sec_to_timeout * 1000 + microsec_to_timeout / 1000;
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c2db19b..fb63471 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2629,10 +2629,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	return HeapTupleGetOid(tup);
 }
 
@@ -3457,10 +3453,6 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4411,10 +4403,6 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..7061bba
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdw_xactdesc.c
+ *		PostgreSQL distributed transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdw_xactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdw_xact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		FdwXactOnDiskData *fdw_insert_xlog = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_insert_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_insert_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_insert_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_insert_xlog->local_xid);
+		/* TODO: This should be really interpreted by each FDW */
+
+		/*
+		 * TODO: we also need to assess whether we want to add this
+		 * information
+		 */
+		appendStringInfo(buf, " foreign transaction info: %s",
+						 fdw_insert_xlog->fdw_xact_id);
+	}
+	else
+	{
+		xl_fdw_xact_remove *fdw_remove_xlog = (xl_fdw_xact_remove *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_remove_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_remove_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_remove_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_remove_xlog->xid);
+	}
+
+}
+
+const char *
+fdw_xact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDW_XACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDW_XACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7..4a9ab3d 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,14 +112,16 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_prepared_xacts=%d max_locks_per_xact=%d "
 						 "wal_level=%s wal_log_hints=%s "
-						 "track_commit_timestamp=%s",
+						 "track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3942734..bc4e109 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -844,6 +845,35 @@ TwoPhaseGetGXact(TransactionId xid)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyProc
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2316,6 +2346,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2375,6 +2411,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8c1621d..9dca0f5 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1131,6 +1132,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1139,6 +1141,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsAtomicCommitReady();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1177,12 +1180,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1340,6 +1344,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -1990,6 +2002,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2146,6 +2161,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2233,6 +2249,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2422,6 +2440,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2627,6 +2646,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7375a78..9de3bcc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5267,6 +5268,7 @@ BootStrapXLOG(void)
 	ControlFile->MaxConnections = MaxConnections;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6354,6 +6356,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6878,14 +6883,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdw_xact, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7077,7 +7083,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7583,6 +7592,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7901,6 +7911,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9217,6 +9230,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9650,7 +9664,8 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9682,6 +9697,7 @@ XLogReportParameters(void)
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9887,6 +9903,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10085,6 +10102,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->MaxConnections = xlrec.MaxConnections;
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a03b005..47e9317 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_prepared_fdw_xacts AS
+       SELECT * FROM pg_prepared_fdw_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
 	l.objoid, l.classoid, l.objsubid,
@@ -773,6 +776,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_fdwxact_resolvers AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_fdwxact_resolver() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index e5dd995..dac1e3a 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -1093,6 +1094,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1407,6 +1420,16 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srv->serverid,
+						useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0bcb237..058bc0a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
@@ -749,7 +750,10 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+		FdwXactMarkForeignServerAccessed(partRelInfo->ri_RelationDesc, 0, true);
+	}
 
 	MemoryContextSwitchTo(oldContext);
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 2ec7fcb..4578bc0 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,10 +226,16 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
+	FdwXactMarkForeignServerAccessed(scanstate->ss.ss_currentRelation,
+									 eflags, node->operation != CMD_SELECT);
+
 	return scanstate;
 }
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 528f587..66c3699 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "commands/trigger.h"
@@ -44,6 +45,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -485,6 +487,10 @@ ExecInsert(ModifyTableState *mtstate,
 								HEAP_INSERT_SPECULATIVE,
 								NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
 												   estate, true, &specConflict,
@@ -530,6 +536,10 @@ ExecInsert(ModifyTableState *mtstate,
 								estate->es_output_cid,
 								0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
@@ -722,6 +732,11 @@ ldelete:;
 							 true /* wait for commit */ ,
 							 &hufd,
 							 changingPart);
+
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -1210,6 +1225,11 @@ lreplace:;
 							 estate->es_crosscheck_snapshot,
 							 true /* wait for commit */ ,
 							 &hufd, &lockmode);
+
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -2321,6 +2341,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 fdw_private,
 															 i,
 															 eflags);
+
+			/* Mark this transaction modified data on the foreign server */
+			FdwXactMarkForeignServerAccessed(resultRelInfo->ri_RelationDesc,
+											 eflags, true);
 		}
 
 		resultRelInfo++;
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index a0bcc04..b2097ad 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -155,6 +155,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d2b695e..b722b9a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8de603d..3f59fdd 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3484,6 +3484,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
 			event_name = "LogicalLauncherMain";
 			break;
@@ -3675,6 +3681,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3890,6 +3899,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDW_XACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 688f462..883ad85 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -896,6 +898,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -971,12 +977,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afb4972..960fd6a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDW_XACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..c5610ee 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -150,6 +152,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, BackendRandomShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +274,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	BackendRandomShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 908f62d..cc578b2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -90,6 +90,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -245,6 +247,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1323,6 +1326,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
 	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	volatile TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1384,6 +1388,7 @@ GetOldestXmin(Relation rel, int flags)
 	/* fetch into volatile var while ProcArrayLock is held */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1434,6 +1439,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDW_XACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3016,6 +3030,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..a42d06e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,5 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+FdwXactLock					46
+FdwXactResolverLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa5..8e55dad 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -398,6 +399,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -799,6 +804,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6e13d14..1302d16 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -2971,6 +2973,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2317e8b..7651352 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/transam.h"
@@ -378,6 +379,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
+/*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
  */
@@ -659,6 +679,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2235,6 +2259,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4056,6 +4126,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+			gettext_noop("Sets the usage of two-phase commit protocol for distributed transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
+	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4e61bc6..88cdc85 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -121,6 +121,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -287,6 +289,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index ad06e8e..ca3eb62 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ab5cb7f..609578c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -209,6 +209,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdw_xact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 895a51f..7df88e0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -306,6 +306,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_worker_processes);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 6fb403a..6d867c8 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -730,6 +730,7 @@ GuessControlValues(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -957,6 +958,7 @@ RewriteControlFile(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* Contents are protected with a CRC */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000..0928f4c
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,147 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL distributed transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDW_XACT_H
+#define FDW_XACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+#define	FDW_XACT_NOT_WAITING		0
+#define	FDW_XACT_WAITING			1
+#define	FDW_XACT_WAITING_RETRY		2
+#define	FDW_XACT_WAIT_COMPLETE		3
+
+#define FdwXactEnabled() (max_prepared_foreign_xacts > 0)
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDW_XACT_ID_MAX_LEN 200
+
+/* Enum to track the status of prepared foreign transaction */
+typedef enum
+{
+	FDW_XACT_INITIAL,
+	FDW_XACT_PREPARING,					/* foreign transaction is being prepared */
+	FDW_XACT_PREPARED,					/* foreign transaction is prepared */
+	FDW_XACT_COMMITTING_PREPARED,		/* foreign prepared transaction is to
+										 * be committed */
+	FDW_XACT_ABORTING_PREPARED, /* foreign prepared transaction is to be
+								 * aborted */
+} FdwXactStatus;
+
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER,	/* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support twophase
+								 * commit */
+} ForeignTwophaseCommitLevel;
+
+/* Shared memory entry for a prepared or being prepared foreign transaction */
+typedef struct FdwXactData *FdwXact;
+
+typedef struct FdwXactData
+{
+	FdwXact		fxact_free_next;	/* Next free FdwXact entry */
+	FdwXact		fxact_next;			/* Pointer to the neext FdwXact entry accosiated
+									 * with the same transaction */
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes place */
+	Oid				userid;			/* user who initiated the foreign transaction */
+	Oid				umid;
+	FdwXactStatus 	status;			/* The state of the foreign transaction. This
+									 * doubles as the action to be taken on this entry. */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/* Shared memory layout for maintaining foreign prepared transaction entries. */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		freeFdwXacts;
+
+	/* Number of valid foreign transaction entries */
+	int			numFdwXacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdw_xacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+typedef struct FdwXactState
+{
+	Oid		serverid;
+	Oid		userid;
+	Oid		umid;
+	char	*fdwxact_id;
+	void	*fdw_state;		/* foreign-data wrapper can keep state here */
+} FdwXactState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdw_xact_exists(TransactionId xid, Oid dboid, Oid serverid,
+				Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactResolveDistributedTransaction(Oid dbid, bool is_active);
+extern void FdwXactResolveAllDanglingTransactions(Oid dbid);
+extern bool FdwXactIsAtomicCommitReady(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void FdwXactMarkForeignServerAccessed(Relation rel, int flags, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+
+#endif   /* FDW_XACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000..4ea65b2
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _FDWXACT_LAUNCHER_H
+#define _FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherWakeupToRequest(void);
+extern void FdwXactLauncherWakeupToRetry(void);
+
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+
+extern bool IsFdwXactLauncher(void);
+
+extern void fdwxact_maybe_launch_resolver(bool ignore_error);
+
+
+#endif	/* _FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000..6b2a24f
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000..e92b5a1
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDW_XACT_INSERT	0x00
+#define XLOG_FDW_XACT_REMOVE	0x10
+
+/* Same as GIDSIZE */
+#define FDW_XACT_MAX_ID_LEN 200
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdw_xact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+} xl_fdw_xact_remove;
+
+extern void fdw_xact_redo(XLogReaderState *record);
+extern void fdw_xact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdw_xact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000..36391d4
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,67 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _RESOLVER_INTERNAL_H
+#define _RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/*
+	 * Foreign transaction resolution queues. Protected by FdwXactLock.
+	 */
+	SHM_QUEUE	FdwXactActiveQueue;
+	SHM_QUEUE	FdwXactRetryQueue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* _RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 0bbe9879..c15dff7 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDW_XACT_ID, "Foreign Transactions", fdw_xact_redo, fdw_xact_desc, fdw_xact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 0e932da..b199c88 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 2c1b2d8..63c833d 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -105,6 +105,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE				(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 30610b3..795e85a 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -227,6 +227,7 @@ typedef struct xl_parameter_change
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6..3d5333a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -178,6 +178,7 @@ typedef struct ControlFileData
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cff58ed..d39ca1e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5032,6 +5032,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_fdwxact_resolver', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_fdwxact_resolver' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5737,6 +5744,22 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_prepared_fdw_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{dbid,transaction,serverid,userid,status,identifier}',
+  prosrc => 'pg_prepared_fdw_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction',
+  proname => 'pg_remove_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_remove_fdw_xact' },
+{ oid => '6052', descr => 'resolve foreign transaction',
+  proname => 'pg_resolve_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_resolve_fdw_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb54..92d47bb 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
@@ -168,6 +169,14 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef bool (*PrepareForeignTransaction_function) (FdwXactState *state);
+typedef bool (*CommitForeignTransaction_function) (FdwXactState *state);
+typedef bool (*RollbackForeignTransaction_function) (FdwXactState *state);
+typedef bool (*ResolveForeignTransaction_function) (FdwXactState *state,
+													bool is_commit);
+typedef bool (*IsTwoPhaseCommitEnabled_function) (Oid serverid);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -235,6 +244,14 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for distributed transactions */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	ResolveForeignTransaction_function ResolveForeignTransaction;
+	IsTwoPhaseCommitEnabled_function IsTwoPhaseCommitEnabled;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -247,7 +264,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e6..d030368 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -68,10 +68,10 @@ typedef struct ForeignTable
 	List	   *options;		/* ftoptions as DefElem list */
 } ForeignTable;
 
-
 extern ForeignServer *GetForeignServer(Oid serverid);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
 							bool missing_ok);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d59c24a..f74d1be 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,6 +759,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDW_XACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -832,7 +834,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDW_XACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -912,6 +915,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_READ,
+	WAIT_EVENT_FDW_XACT_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index cb613c8..45880b2 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -153,6 +153,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction
+								 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 75bab29..25d6a2f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDW_XACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..81560bd 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -94,6 +94,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 735dd37..fdd6ded 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1413,6 +1413,13 @@ pg_policies| SELECT n.nspname AS schemaname,
    FROM ((pg_policy pol
      JOIN pg_class c ON ((c.oid = pol.polrelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+pg_prepared_fdw_xacts| SELECT f.dbid,
+    f.transaction,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.identifier
+   FROM pg_prepared_fdw_xacts() f(dbid, transaction, serverid, userid, status, identifier);
 pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
@@ -1821,6 +1828,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_fdwxact_resolvers| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_fdwxact_resolver() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.10.5

#14Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#13)
4 attachment(s)

On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

# It took a long time to come here..

At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

...

* Updated docs, added the new section "Distributed Transaction" at
Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact directory.

* Some bug fixes.

Please reivew them.

I have some comments, with apologize in advance for possible
duplicate or conflict with others' comments so far.

Thank youf so much for reviewing this patch!

0001:

This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
relation is modified. Isn't it needed when UNLOGGED tables are
modified? It may be better that we have dedicated classification
macro or function.

I think even if we do atomic commit for modifying the an UNLOGGED
table and a remote table the data will get inconsistent if the local
server crashes. For example, if the local server crashes after
prepared the transaction on foreign server but before the local commit
and, we will lose the all data of the local UNLOGGED table whereas the
modification of remote table is rollbacked. In case of persistent
tables, the data consistency is left. So I think the keeping data
consistency between remote data and local unlogged table is difficult
and want to leave it as a restriction for now. Am I missing something?

The flag is handled in heapam.c. I suppose that it should be done
in the upper layer considering coming pluggable storage.
(X_F_ACCESSEDTEMPREL is set in heapam, but..)

Yeah, or we can set the flag after heap_insert in ExecInsert.

0002:

The name FdwXactParticipantsForAC doesn't sound good for me. How
about FdwXactAtomicCommitPartitcipants?

+1, will fix it.

Well, as the file comment of fdwxact.c,
FdwXactRegisterTransaction is called from FDW driver and
F_X_MarkForeignTransactionModified is called from executor. I
think that we should clarify who is responsible to the whole
sequence. Since the state of local tables affects, I suppose
executor is that. Couldn't we do the whole thing within executor
side? I'm not sure but I feel that
F_X_RegisterForeignTransaction can be a part of
F_X_MarkForeignTransactionModified. The callers of
MarkForeignTransactionModified can find whether the table is
involved in 2pc by IsTwoPhaseCommitEnabled interface.

Indeed. We can register foreign servers by executor while FDWs don't
need to register anything. I will remove the registration function so
that FDW developers don't need to call the register function but only
need to provide atomic commit APIs.

if (foreign_twophase_commit == true &&
((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));

The error is emitted when a the GUC is turned off in the
trasaction where MarkTransactionModify'ed. I think that the
number of the variables' possible states should be reduced for
simplicity. For example in the case, once foreign_twopase_commit
is checked in a transaction, subsequent changes in the
transaction should be ignored during the transaction.

I might have not gotten your comment correctly but since the
foreign_twophase_commit is a PGC_USERSET parameter I think we need to
check it at commit time. Also we need to keep participant servers even
when foreign_twophase_commit is off if both max_prepared_foreign_xacts
and max_foreign_xact_resolvers are > 0.

I will post the updated patch in this week.

Attached the updated version patches.

Based on the review comment from Horiguchi-san, I've changed the
atomic commit API so that the FDW developer who wish to support atomic
commit don't need to call the register function. The atomic commit
APIs are following:

* GetPrepareId
* PrepareForeignTransaction
* CommitForeignTransaction
* RollbackForeignTransaction
* ResolveForeignTransaction
* IsTwophaseCommitEnabled

The all APIs except for GetPreapreId is required for atomic commit.

Also, I've changed the foreign_twophase_commit parameter to an enum
parameter based on the suggestion from Robert[1]. Valid values are
'required', 'prefer' and 'disabled' (default). When set to either
'required' or 'prefer' the atomic commit will be used. The difference
between 'required' and 'prefer' is that when set to 'requried' we
require for *all* modified server to be able to use 2pc whereas when
'prefer' we require 2pc where available. So if any of written
participants disables 2pc or doesn't support atomic comit API the
transaction fails. IOW, when 'required' we can commit only when data
consistency among all participant can be left.

Please review the patches.

Since the previous patch conflicts with current HEAD attached updated
set of patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v21-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v21-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 683b47de51f23bb899c92bac8bc3947d2d262c5a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 8 Feb 2018 11:26:46 +0900
Subject: [PATCH v21 1/4] Keep track of writing on non-temporary relation.

---
 src/backend/access/heap/heapam.c | 12 ++++++++++++
 src/include/access/xact.h        |  5 +++++
 2 files changed, 17 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb63471..c2db19b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2629,6 +2629,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleGetOid(tup);
 }
 
@@ -3453,6 +3457,10 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4403,6 +4411,10 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 689c57c..2c1b2d8 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -98,6 +98,11 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
-- 
2.10.5

v21-0004-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v21-0004-Add-regression-tests-for-atomic-commit.patchDownload
From d3b2da59daf037fe5db43f268ce2cd582c84e390 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v21 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0..71c8b9d 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..a23f120
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port', two_phase_commit 'on');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port', two_phase_commit 'on');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 3248603..c1cd8ae 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2288,9 +2288,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2305,7 +2308,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v21-0003-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v21-0003-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 972cf3a26d6b970d7539e4309f11c4f96b5d89fd Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v21 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/connection.c              | 673 ++++++++++++++++---------
 contrib/postgres_fdw/expected/postgres_fdw.out | 387 +++++++++++++-
 contrib/postgres_fdw/option.c                  |   5 +-
 contrib/postgres_fdw/postgres_fdw.c            |  60 ++-
 contrib/postgres_fdw/postgres_fdw.h            |  11 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 151 +++++-
 doc/src/sgml/postgres-fdw.sgml                 |  37 ++
 7 files changed, 1069 insertions(+), 255 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fe4893a..3264300 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -14,9 +14,12 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -45,7 +48,7 @@
  */
 typedef Oid ConnCacheKey;
 
-typedef struct ConnCacheEntry
+struct ConnCacheEntry
 {
 	ConnCacheKey key;			/* hash key (must be first) */
 	PGconn	   *conn;			/* connection to foreign server, or NULL */
@@ -56,9 +59,19 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		am_participant_of_ac;	/* true if fdwxact code control the transaction */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
-} ConnCacheEntry;
+};
+
+typedef struct PgFdwXactState
+{
+	Oid		serverid;
+	Oid		userid;
+	Oid		umid;
+	ConnCacheEntry	*conn;
+} PgFdwXactState;
 
 /*
  * Connection cache (initialized on first use)
@@ -69,17 +82,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
 					   SubTransactionId parentSubid,
@@ -91,24 +100,43 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	ConnCacheEntry *entry;
+	ConnCacheKey	key;
+	bool			found;
 
-/*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
- */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+
+	return entry;
+}
+
+
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +156,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,24 +163,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -182,6 +192,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping		*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +201,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->am_participant_of_ac = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +213,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,16 +229,46 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
 /*
  * Connect to remote server using specified server and user mapping properties.
+ * If the attempt to connect fails, and the caller can handle connection failure
+ * (connection_error_ok = true) return NULL, throw error otherwise.
  */
 static PGconn *
 connect_pg_server(ForeignServer *server, UserMapping *user)
@@ -265,11 +317,22 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 
 		conn = PQconnectdbParams(keywords, values, false);
 		if (!conn || PQstatus(conn) != CONNECTION_OK)
+		{
+			char	   *connmessage;
+			int			msglen;
+
+			/* libpq typically appends a newline, strip that */
+			connmessage = pstrdup(PQerrorMessage(conn));
+			msglen = strlen(connmessage);
+			if (msglen > 0 && connmessage[msglen - 1] == '\n')
+				connmessage[msglen - 1] = '\0';
+
 			ereport(ERROR,
 					(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
 					 errmsg("could not connect to server \"%s\"",
 							server->servername),
 					 errdetail_internal("%s", pchomp(PQerrorMessage(conn)))));
+		}
 
 		/*
 		 * Check that non-superuser has used password to establish connection;
@@ -414,15 +477,20 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
+	ForeignServer	*server = GetForeignServer(serverid);
 
 	/* Start main transaction if we haven't yet */
 	if (entry->xact_depth <= 0)
 	{
 		const char *sql;
 
+		/* Register the new foreign server if enabled */
+		if (server_uses_twophase_commit(server))
+			entry->am_participant_of_ac = true;
+
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
@@ -644,193 +712,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 }
 
 /*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow remote transactions that modified anything,
-					 * since it's not very reasonable to hold them open until
-					 * the prepared transaction is committed.  For the moment,
-					 * throw error unconditionally; later we might allow
-					 * read-only cases.  Note that the error will cause us to
-					 * come right back here with event == XACT_EVENT_ABORT, so
-					 * we'll clean up the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot prepare a transaction that modified remote tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
-/*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
 static void
@@ -846,10 +727,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -860,6 +737,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			return;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1193,3 +1074,327 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * The function prepares transaction on foreign server. This function
+ * is called only at the pre-commit phase of the local transaction. Since
+ * we should have the connection to the server that we are interested in
+ * we don't use serverid and userid that are necessary to get user mapping
+ * that is the key of the connection cache.
+ */
+bool
+postgresPrepareForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate;
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+	StringInfo	command;
+
+	entry = GetConnectionState(state->umid, false, false);
+	//entry = hash_search(ConnectionHash, &(state->umid), HASH_FIND, NULL);
+
+	if (!entry->xact_got_connection || !entry->conn)
+		return true;
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	rstate = (PgFdwXactState *) palloc0(sizeof(PgFdwXactState));
+	rstate->serverid = state->serverid;
+	rstate->userid = state->userid;
+	rstate->umid = state->umid;
+	rstate->conn = entry;
+	state->fdw_state = (void *)rstate;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	if (result)
+		elog(DEBUG1, "prepared foreign transaction on server %u with ID %s",
+			 state->serverid, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function commits the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresCommitForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate;
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+
+	entry = GetConnectionState(state->umid, false, false);
+
+	if (!entry->xact_got_connection || !entry->conn)
+		return true;
+
+	rstate = (PgFdwXactState *) palloc0(sizeof(PgFdwXactState));
+	rstate->serverid = state->serverid;
+	rstate->userid = state->userid;
+	rstate->umid = state->umid;
+	rstate->conn = entry;
+	state->fdw_state = (void *)rstate;
+
+	/*
+	 * If abort cleanup previously failed for this connection,
+	 * we can't issue any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	/*
+	 * If there were any errors in subtransactions, and we
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * The function rollbacks the transactionon foreign server. This
+ * function is called both at the pre-commit phase of the local transaction
+ * when committing and at the end of the local transaction when aborting.
+ * Since we should the connections to the server that involved with the local
+ * transaction we don't use serverid and userid that are necessary to get
+ * user mapping that is the key of connection cache.
+ */
+bool
+postgresRollbackForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate = (PgFdwXactState *) state->fdw_state;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (rstate)
+		entry = rstate->conn;
+	else
+		entry = GetConnectionCacheEntry(state->umid);
+
+	/*
+	 * In rollback local transaction, not having connection entry means that
+	 * no transaction started. So we can regard it as success.
+	 */
+	if (!entry->xact_got_connection || !entry->conn)
+		return true;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is already unsalvageable, do only the cleanup
+	 * and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return true;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+	else
+	{
+		entry->have_prep_stmt = false;
+		entry->have_error = false;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return !abort_cleanup_failure;
+}
+
+bool
+postgresResolveForeignTransaction(FdwXactState *state, bool is_commit)
+{
+	ConnCacheEntry *entry = NULL;
+	StringInfo	command;
+	bool result = false;
+	PGresult	*res;
+
+	entry = GetConnectionState(state->umid, false, false);
+
+	if (!entry->conn)
+		return false;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 state->fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		/*
+		 * The command failed, raise a warning to log the reason of failure.
+		 * We may not be in a transaction here, so raising error doesn't
+		 * help. Even if we are in a transaction, it would be the resolver
+		 * transaction, which will get aborted on raising error, thus
+		 * delaying resolution of other prepared foreign transactions.
+		 */
+		pgfdw_report_error(LOG, res, entry->conn, false, command->data);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * If we tried to COMMIT/ABORT a prepared transaction and the prepared
+		 * transaction was missing on the foreign server, it was probably
+		 * resolved by some other means. Anyway, it should be considered as resolved.
+		 */
+		result = (sqlstate == ERRCODE_UNDEFINED_OBJECT);
+	}
+	else
+		result = true;
+
+	elog(DEBUG1, "%s prepared foreign transaction on server %u with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 state->serverid,
+		 state->fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->am_participant_of_ac = false;
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 21a2ef5..15dadf4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,15 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -185,16 +207,19 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                                      List of foreign tables
- Schema |   Table    |  Server   |                   FDW options                    | Description 
---------+------------+-----------+--------------------------------------------------+-------------
- public | ft1        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft2        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft4        | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
- public | ft5        | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft6        | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft_pg_type | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
-(6 rows)
+                                         List of foreign tables
+ Schema |      Table       |  Server   |                   FDW options                    | Description 
+--------+------------------+-----------+--------------------------------------------------+-------------
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
+(9 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8650,3 +8675,345 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+\det+
+                                                 List of foreign tables
+ Schema |      Table       |  Server   |                            FDW options                            | Description 
+--------+------------------+-----------+-------------------------------------------------------------------+-------------
+ public | fpagg_tab_p1     | loopback  | (table_name 'pagg_tab_p1')                                        | 
+ public | fpagg_tab_p2     | loopback  | (table_name 'pagg_tab_p2')                                        | 
+ public | fpagg_tab_p3     | loopback  | (table_name 'pagg_tab_p3')                                        | 
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')                             | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1', use_remote_estimate 'true') | 
+ public | ft3              | loopback  | (table_name 'loct3', use_remote_estimate 'true')                  | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')                             | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type')                  | 
+ public | ftprt1_p1        | loopback  | (table_name 'fprt1_p1', use_remote_estimate 'true')               | 
+ public | ftprt1_p2        | loopback  | (table_name 'fprt1_p2')                                           | 
+ public | ftprt2_p1        | loopback  | (table_name 'fprt2_p1', use_remote_estimate 'true')               | 
+ public | ftprt2_p2        | loopback  | (table_name 'fprt2_p2', use_remote_estimate 'true')               | 
+ public | rem1             | loopback  | (table_name 'loc1')                                               | 
+ public | rem2             | loopback  | (table_name 'loc2')                                               | 
+(19 rows)
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+  srvname  
+-----------
+ loopback
+ loopback2
+ loopback3
+(3 rows)
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  4
+(1 row)
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+ERROR:  cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+(4 rows)
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+(5 rows)
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  cannot prepare a transaction that modified remote tables
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
+SELECT * FROM ft9_not_twophase;
+ c1 
+----
+  1
+  2
+  2
+  4
+  6
+  6
+  9
+  9
+(8 rows)
+
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 6854f1b..1f45b1c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -108,7 +108,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		 * Validate option value, when we can do so without any context.
 		 */
 		if (strcmp(def->defname, "use_remote_estimate") == 0 ||
-			strcmp(def->defname, "updatable") == 0)
+			strcmp(def->defname, "updatable") == 0 ||
+			strcmp(def->defname, "two_phase_commit") == 0)
 		{
 			/* these accept only boolean values */
 			(void) defGetBoolean(def);
@@ -177,6 +178,8 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* two phase commit support */
+		{"two_phase_commit", ForeignServerRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fd20aa9..1135046 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,8 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -359,6 +361,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
 							 void *extra);
+static bool postgresIsTwoPhaseCommitEnabled(Oid serverid);
 
 /*
  * Helper functions
@@ -452,7 +455,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -506,10 +508,29 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->ResolveForeignTransaction = postgresResolveForeignTransaction;
+	routine->IsTwoPhaseCommitEnabled = postgresIsTwoPhaseCommitEnabled;
+
 	PG_RETURN_POINTER(routine);
 }
 
 /*
+ * postgresIsTwoPhaseCommitEnabled
+ */
+static bool
+postgresIsTwoPhaseCommitEnabled(Oid serverid)
+{
+	ForeignServer	*server = GetForeignServer(serverid);
+
+
+	return server_uses_twophase_commit(server);
+}
+
+/*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
  *
@@ -1356,7 +1377,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2411,7 +2432,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2704,7 +2725,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								&retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3321,7 +3342,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4108,7 +4129,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4198,7 +4219,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4421,7 +4442,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
@@ -5803,3 +5824,26 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+/*
+ * server_uses_twophase_commit
+ * Returns true if the foreign server is configured to support 2PC.
+ */
+bool
+server_uses_twophase_commit(ForeignServer *server)
+{
+	ListCell		*lc;
+
+	/* Check the options for two phase compliance */
+	foreach(lc, server->options)
+	{
+		DefElem    *d = (DefElem *) lfirst(lc);
+
+		if (strcmp(d->defname, "two_phase_commit") == 0)
+		{
+			return defGetBoolean(d);
+		}
+	}
+	/* By default a server is not 2PC compliant */
+	return false;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 70b538e..3526923 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
@@ -110,12 +111,14 @@ typedef struct PgFdwRelationInfo
 	int			relation_index;
 } PgFdwRelationInfo;
 
+typedef struct ConnCacheEntry ConnCacheEntry;
+
 /* in postgres_fdw.c */
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -123,6 +126,11 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern bool postgresPrepareForeignTransaction(FdwXactState *state);
+extern bool postgresCommitForeignTransaction(FdwXactState *state);
+extern bool postgresRollbackForeignTransaction(FdwXactState *state);
+extern bool postgresResolveForeignTransaction(FdwXactState *state,
+											  bool is_commit);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -181,6 +189,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						List *remote_conds, List *pathkeys, bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 88c4cb4..2554c9c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,19 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -2304,7 +2331,6 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a
 
 RESET enable_partitionwise_join;
 
-
 -- ===================================================================
 -- test partitionwise aggregates
 -- ===================================================================
@@ -2354,3 +2380,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+
+\det+
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO on;
+
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO ft8_twophase VALUES(3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO "S 1"."T 6" VALUES (4);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(5);
+INSERT INTO "S 1"."T 6" VALUES (5);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(8);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Fails, cannot commit the distributed transaction if 2PC-non-capable
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Disables atomic commit, and success the same case as above.
+SET foreign_twophase_commit TO off;
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
+
+-- Enable atomic commit, again.
+SET foreign_twophase_commit TO on;
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(10);
+INSERT INTO ft8_twophase VALUES(10);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft7_twophase;
+SELECT * FROM ft8_twophase;
+
+-- Fails, cannot prepare the transaction if non-supporeted
+-- server involved in.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(11);
+INSERT INTO ft9_not_twophase VALUES(11);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+SELECT * FROM ft9_not_twophase;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 54b5e98..f4a9ff5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
-- 
2.10.5

v21-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v21-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 8f9796852b6325037f4ee997fc787a27ab869ce5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH v21 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |   97 +
 doc/src/sgml/config.sgml                      |  143 +-
 doc/src/sgml/distributed-transaction.sgml     |  157 ++
 doc/src/sgml/fdwhandler.sgml                  |  203 ++
 doc/src/sgml/filelist.sgml                    |    1 +
 doc/src/sgml/func.sgml                        |   51 +
 doc/src/sgml/monitoring.sgml                  |   60 +
 doc/src/sgml/postgres.sgml                    |    1 +
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/fdwxact.c          | 2678 +++++++++++++++++++++++++
 src/backend/access/fdwxact/fdwxact_launcher.c |  641 ++++++
 src/backend/access/fdwxact/fdwxact_resolver.c |  331 +++
 src/backend/access/heap/heapam.c              |   12 -
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   65 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   32 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/executor/execPartition.c          |    4 +
 src/backend/executor/nodeForeignscan.c        |    8 +
 src/backend/executor/nodeModifyTable.c        |   24 +
 src/backend/foreign/foreign.c                 |   43 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   80 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  147 ++
 src/include/access/fdwxact_launcher.h         |   32 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   52 +
 src/include/access/resolver_internal.h        |   67 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/foreign/fdwapi.h                  |   18 +-
 src/include/foreign/foreign.h                 |    2 +-
 src/include/pgstat.h                          |    8 +-
 src/include/storage/proc.h                    |   10 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |   12 +
 62 files changed, 5287 insertions(+), 40 deletions(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_launcher.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9edba96..391dc7f 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9624,6 +9624,103 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-prepared-fdw-xacts">
+  <title><structname>pg_prepared_fdw_xacts</structname></title>
+
+  <indexterm zone="view-pg-prepared-fdw-xacts">
+   <primary>pg_prepared_fdw_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_prepared_fdw_xacts</structname> displays
+   information about foreign transactions that are currently prepared on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="fdw-transaction-managements"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_prepared_xacts</structname> contains one row per prepared
+   foreign transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_prepared_fdw_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>transaction</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Transaction id that this foreign transaction associates with
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server that this foreign server is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction: <literal>prepared</literal>, <literal>committing</literal>, <literal>aborting</literal> or <literal>unknown</literal>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7554cba..557b3f2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3611,7 +3611,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -7827,6 +7826,148 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophase_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign transaction
+         to be resolved before the command returns a "success" indication to the client.
+         Valid values are <literal>required</literal>, <literal>prefer</literal> and
+         <literal>disabled</literal>. The default setting is <literal>disabled</literal>.
+         When <literal>disabled</literal>, there can be risk of database consistency among
+         distributed transaction if some foreign server crashes during committing the
+         distributed transaction. When set to <literal>required</literal> the distributed
+         transaction requires that all written servers can use two-phase commit protocol.
+         That is, the transaction fails if any of servers returns <literal>false</literal>
+         from <function>IsTwoPhaseCommitEnabled</function> or does not support transaction
+         management callback routines(described in
+         <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction requires
+         two-phase commit protocol where available but without failing when it is not
+         available.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero value to
+         set this parameter either <literal>required</literal> or <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one transaction
+         is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism.  You should set this value to
+         zero only if you set <varname>max_foreign_transaction_resolvers</varname> as
+         much as databases you have. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..5143499
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,157 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction Management</title>
+
+ <para>
+  This chapter explains what distributed transaction management is, and how it can be configured
+  in PostgreSQL.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit is an operation that applies a set of changes as a single operation
+   globally. <productname>PostgreSQL</productname> provides a way to perform a transaction
+   with foreign resources using <literal>Foreign Data Wrapper</literal>. Using the
+   <productname>PostgreSQL</productname>'s atomic commit ensures that all changes
+   on foreign servers end in either commit or rollback using the transaction callback
+   routines (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs Two-phase commit protocol, which is a
+    type of atomic commitment protocol (ACP). Using Two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed transaction manager
+    prepares all transaction on the foreign servers if two-phase commit is required.
+    Two-phase commit is required only if the transaction modifies data on two or more
+    servers including the local server itself and user requests it by
+    <xref linkend="guc-foreign-twophase-commit"/>. If all preparations on foreign servers
+    got successful go to the next step. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes over rollback, then rollback all transactions
+    on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the transaction
+    locally. Any failure happens in this step <productname>PostgreSQL</productname> changes
+    over rollback, then rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign Transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Foreign Transaction Resolution</title>
+
+   <para>
+    Foreign transaction resolutions are performed by foreign transaction resolver process.
+    They commit all prepared transaction on foreign servers if the coordinator received
+    an agreement message from all foreign server during the first step. On the other hand,
+    if any foreign server failed to prepare the transaction, it rollbacks all prepared
+    transactions.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions on one
+    database of the coordinator side. On failure during resolution, they retries to
+    resolve after <varname>foreign_transaction_resolution_interval</varname>.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>In-doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit or rollback
+    using two-phase commit protocol. However, if the second phase fails for whatever reason
+    the transaction becomes in-doubt. The transactions becomes in-doubt in the following
+    situations:
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server crashes during atomic commit
+      operation.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server got a cancellation by user during
+      atomic commit.
+     </para>
+    </listitem>
+   </itemizedlist>
+
+   In-doubt transactions are automatically handled by foreign transaction resolver process
+   when there is no online transaction requesting resolutions.
+   <function>pg_resolve_fdw_xact</function> provides a way to resolve transactions on foreign
+   servers manually that participated the distributed transaction manually.
+   </para>
+
+   <para>
+    The atomic commit operation is crash-safe. The being processed foreign transactions at
+    crash are processed by a foreign transaction resolvers as an in-doubt transaction
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-fdwxact-resolver-view"><literal>pg_stat_fdwxact_resolver</literal></link>
+    view. This view contains one row for every foreign Transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd..3da13c9 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1390,6 +1390,118 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>atomic commit</firstterm>
+     (as described in <xref linkend="fdw-transaction-managements"/>), it must call the
+     registrasaction function <function>FdwXactRegisterForeignTransaction</function>
+     and provide the following callback functions:
+    </para>
+
+    <para>
+<programlisting>
+bool
+PrepareForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if atomic commit is required.
+    Returning <literal>true</literal> means that preparing the foreign
+    transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Commit the not-prepared transaction on the foreign server.
+    This function is called at the pre-commit phase of local
+    transaction if atomic commit is not required. The atomic
+    commit is not required either when we modified data on
+    only one server including the local server or when userdoesn't
+    request atomic commit by <xref linkend="guc-foreign-twophase-commit"/>.
+    Returning <literal>true</literal> means that commit the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Rollback a not-prepared transaction on the foreign server.
+    This function is called at the end of local transaction after
+    rollbacked locally either when user requested rollback or when
+    any error occurs during the transaction. This function could
+    be called recursively if any error occurs during rollback the
+    foreign transaction for whatever reason. You need to track
+    recursion and prevent this function from being called infinitely.
+    Returning <literal>true</literal> means that rollback the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+ResolvePreparedForeignTransaction(FdwXactResolveState *state,
+                                  bool is_commit);
+</programlisting>
+    Commit or rollback the prepared transaction on the foreign server.
+    When <varname>is_commit</varname> is true, it indicates that the foreign
+    transaction should be committed. Otherwise the foreign transaction should
+    be aborted.
+    This function normally is called by the foreign transaction resolver
+    process but can also be called by <function>pg_resovle_fdw_xacts</function>
+    function. In the resolver process, this function is called either
+    when a backend requests the resolver process to resolve a distributed
+    transaction after prepared, or when a database has dangling
+    transactions. Returning <literal>true</literal> means that resolving
+    the foreign transaction got successful.
+    In abort case, please note that the prepared transaction identified
+    by <varname>state->fdwxact_id</varname> might not exist on the foreign
+    server. If you failed to resolve the foreign transaction due to undefined
+    object error (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) you should
+   regards it as success and return <literal>true</literal>.
+    </para>
+    <para>
+<programlisting>
+bool
+IsTwoPhaseCommitEnabled(Oid serverid);
+</programlisting>
+    Return <literal>true</literal> if the foreign server identified by
+    <literal>serverid</literal> is capable of two-phase commit protocol.
+    This function is called at commit time once.
+    Return <literal>false</literal> indicates that the current transaction
+    cannot use atomic commit even if atomic commit is requested by user.
+    </para>
+
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1835,4 +1947,95 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+    <title>Transaction managements for Foreign Data Wrappers</title>
+
+    <para>
+     <productname>PostgreSQL</productname> foreign transaction manager
+     allows FDWs to read and write data on foreign server within a transaction while
+     maintaining atomicity of the foreign data (aka atomic commit). Using
+     atomic commit, it guarantees that a distributed transaction is committed
+     or rollbacked on all participants foreign
+     server.  To achieve atomic commit, <productname>PostgreSQL</productname>
+     employees two-phase commit protocol, which is a type of atomic commitment
+     protocol. Every FDW that wish to support atomic commit
+     is required to support the transaction management callback routines:
+     <function>PrepareForeignTransaction</function>,
+     <function>CommitForeignTransaction</function>,
+     <function>RollbackForeignTransaction</function>,
+     <function>ResolveForeignTransaction</function>,
+     <function>IsTwoPhaseCommitEnabled</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+     Tranasction of foreign server that supports these callback routines is
+     managed by <productname>PostgreSQL</productname>'s distributed  transaction
+     manager. Each transaction management callbacks are called at appropriate time.
+    </para>
+
+    <para>
+     The information in <literal>FdwXactState</literal> can be used to identify
+     foreign servers. <literal>state-&gt;fdw_state</literal> is a <type>void</type>
+     pointer that is available for FDW transaction functions to store Information
+     relevant to the particular foreign server.  It is useful for passing
+     information forward from <function>PrepareForeignTransaction</function> and/or
+     <function>CommitTransaciton</function> to
+     <function>RollbackForeignTransaction</function>, there by avoiding recalculation.
+     Note that since <function>ResolveForeignTransaction</function> is called
+     idependently from these callback routines, the information is not passed to
+     <function>ResolverForeignTransaction</function>.
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling <function>PrepareForeignTransaction</function>
+     if two-phase commit protocol is required. Two-phase commit is required only if
+     the transaction modified data on more than one servers including the local
+     server itself and user requests atomic commit. <productname>PostgreSQL</productname>
+     can commit locally and go to the next step if and only if all preparing foreign
+     transactions got successful. If two-phase commit is not required, the foreign
+     transaction manager commits each transaction calling
+     <function>CommitForeignTransaction</function> and then commit locally.
+     If any failure happens or user requests to cancel during the pre-commit phase
+     the distributed Transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function> for not-prepared foreign
+     servers, and then rollback locally. The prepared foreign servers are rollbacked
+     by a foreign transaction resolver process.
+    </para>
+
+    <para>
+     Once committed locally, the distributed transaction must be committed. The
+     prepared foreign transaction will be committed by foreign transaction resolver
+     process.
+    </para>
+
+    <para>
+     When two-phase commit is required, after committed locally, the transaction
+     commit will wait for all prepared foreign transaction to be committed before
+     completetion. One foreign transaction resolver process is responsible for
+     foreign transaction resolution on a database.
+     <function>ResolverForeignTransaction</function> is called by the foreign
+     transaction resolver process when resolution.
+     <function>ResolveForeignTransaction</function> is also be called
+     when user executes <function>pg_resovle_fdw_xact</function> function.
+    </para>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a..38d6fcb 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 96d4541..3690df1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20825,6 +20825,57 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-fdw-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_fdw_xacts</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_fdw_xacts</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function search for foreign transaction
+        matching the arguments and resolves then. This function won't resolve
+        a foreign transaction which is in progress, or one that is locked by some
+        other backend.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_fdw_xact</function>
+        except it remove foreign transaction entry without resolving.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index add7145..00f0030 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_fdw_xact_resolver</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-fdwxact-resolver-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1194,6 +1202,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
         </row>
@@ -1409,6 +1429,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2218,6 +2242,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-fdwxact-resolver-view" xreflabel="pg_stat_fdw_xact_resolver">
+   <title><structname>pg_stat_fdw_xact_resolver</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603..c10e21f 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -164,6 +164,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index bd93a6a..4a1ebdc 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  tablesample transam
+			  tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..9ddbb14
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o fdwxact_resolver.o fdwxact_launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..1f270db
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2678 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL distributed transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * When a foreign data wrapper starts transaction on a foreign server that
+ * is capable of two-phase commit protocol, foreign data wrappers registers
+ * the foreign transaction using function FdwXactRegisterForeignTransaction()
+ * in order to participate to a group for atomic commit. Participants are
+ * identified by oid of foreign server and user. When the foreign transaction
+ * begins to modify data the executor marks it as modified using
+ * FdwXactMarkForeignTransactionModified().
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. After committing or rolling back locally, we
+ * notify the resolver process and tell it to commit or roll back those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail when an ERROR
+ * after already committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * waiters each time we receive a request. We have two queues: the active
+ * queue and the retry queue. The backend is inserted to the active queue at
+ * first, and then it is moved to the retry queue by the resolver process if
+ * the resolution fails. The backends in the retry queue are processed at
+ * interval of foreign_transaction_resolution_retry_interval.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or more
+ * servers including itself. In other case, all foreign transactions are
+ * committed during pre-commit.
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. dangling
+ * transaction). Dangling transactions are processed by the resolve process
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * 	* On PREPARE redo we add the foreign transaction to FdwXactCtl->fdw_xacts.
+ *	  We set fdw_xact->inredo to true for such entries.
+ *	* On Checkpoint redo, we iterate through FdwXactCtl->fdw_xacts entries that
+ *	  have set fdw_xact->inredo true and are behind the redo_horizon. We save
+ *    them to disk and then set fdw_xact->ondisk to true.
+ *	* On COMMIT and ABORT we delete the entry from FdwXactCtl->fdw_xacts.
+ *	  If fdw_xact->ondisk is true, we delete the corresponding file from
+ *	  the disk as well.
+ *  * RecoverFdwXacts loads all foreign transaction entries from disk into
+ *    memory at server startup.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Is atomic commit requested by user? */
+#define IsAtomicCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+#define IsAtomicCommitRequested() \
+	(IsAtomicCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+#define FDW_XACT_ACTION_COMMIT	 		0x01
+#define FDW_XACT_ACTION_TWOPHASE_COMMIT 0x02
+
+/* Structure to bundle the foreign transaction participant */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in global entry. NULL if
+	 * this foreign transaction is registered but not inserted
+	 * yet.
+	 */
+	FdwXact		fdw_xact;
+	char		*fdw_xact_id;
+
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	bool		modified;					/* true if modified the data on server */
+	bool		twophase_commit_enabled;	/* true if the server can execute
+											 * two-phase commit protocol */
+	void			*fdw_state;				/* fdw-private state */
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact;
+	CommitForeignTransaction_function	commit_foreign_xact;
+	RollbackForeignTransaction_function	rollback_foreign_xact;
+	IsTwoPhaseCommitEnabled_function	is_twophase_commit_enabled;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit.
+ * This list has only foreign servers that support atomic commit FDW
+ * API regardless of their configuration.
+ */
+static List *FdwXactAtomicCommitParticipants = NIL;
+static bool FdwXactAtomicCommitReady = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDW_XACTS_DIR "pg_fdw_xact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDW_XACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDW_XACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+static void FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, bool modified);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactCommitForeignTransaction(FdwXactParticipant *fdw_part);
+static bool FdwXactResolveForeignTransaction(FdwXactState *state, FdwXact fdwxact,
+											 int elevel);
+static void FdwXactComputeRequiredXmin(void);
+static bool FdwXactAtomicCommitRequired(void);
+static void FdwXactQueueInsert(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid, bool give_warnings);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool give_warnings);
+static List *get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						   bool need_lock);
+static FdwXact get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								bool need_lock);
+static FdwXact get_all_fdw_xacts(int *length);
+static FdwXact insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							   Oid umid, char *fdw_xact_id);
+static char *generate_fdw_xact_identifier(TransactionId xid, Oid serverid, Oid userid);
+static void remove_fdw_xact(FdwXact fdw_xact);
+static FdwXactState *create_fdw_xact_state(void);
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/*
+ * Remember accessed foreign server. This function is called by executor when
+ * it begins to access foreign server. If FDW of the foreign server supports
+ * atomic commit API, it is registered as a transaction participant of distributed
+ * transaction.
+ */
+void
+FdwXactMarkForeignServerAccessed(Relation rel, int flags, bool modified)
+{
+	FdwRoutine			*fdwroutine;
+	ListCell   			*lc;
+	Oid					serverid;
+	Oid					userid;
+
+	/* Quick return if atomic commit is not enabled */
+	if (!IsAtomicCommitEnabled())
+		return;
+
+	/* Do nothing in EXPLAIN (no ANALYZE) case */
+	if (flags && EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	serverid = GetForeignServerIdByRelId(RelationGetRelid(rel));
+	fdwroutine  = GetFdwRoutineByRelId(RelationGetRelid(rel));
+
+	/*
+	 * If the being modified foreign server doesn't have the atomic commit API
+	 * we don't manage the foreign transaction in the distributed transaction
+	 * manager.
+	 */
+	if (fdwroutine->IsTwoPhaseCommitEnabled == NULL)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		return;
+	}
+
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	foreach(lc, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	FdwXactRegisterForeignTransaction(serverid, userid, modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign server identified by given server id must support atomic
+ * commit APIs. Registered foreign transaction are managed by foreign
+ * transaction manager until the end of the transaction.
+ */
+static void
+FdwXactRegisterForeignTransaction(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	FdwRoutine			*fdw_routine;
+	MemoryContext		old_ctx;
+	char				*fdwxact_id;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	/*
+	 * Participants information is needed at the end of a transaction, where
+	 * system cache are not available. Save it in TopTransactionContext
+	 * beforehand so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	/* Make sure that the FDW has transaction handlers */
+	if (!fdw_routine->PrepareForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function provided for preparing foreign transaction for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to commit a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+	if (!fdw_routine->RollbackForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to rollback a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+
+	/* Generate an unique identifier */
+	if (fdw_routine->GetPrepareId)
+	{
+		char *id;
+		int fdwxact_id_len = 0;
+
+		id = fdw_routine->GetPrepareId(GetTopTransactionId(),
+											   foreign_server->serverid,
+											   user_mapping->userid,
+											   &fdwxact_id_len);
+
+		if (!id)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 (errmsg("foreign transaction identifier is not provided"))));
+
+		/* Check length of foreign transaction identifier */
+		id[fdwxact_id_len] = '\0';
+		if (fdwxact_id_len > NAMEDATALEN)
+			ereport(ERROR,
+					(errcode(ERRCODE_NAME_TOO_LONG),
+					 errmsg("foreign transaction identifer \"%s\" is too long",
+							id),
+					 errdetail("foreign transaction identifier must be less than %d characters.",
+							   NAMEDATALEN)));
+
+		fdwxact_id = pstrdup(id);
+	}
+	else
+		fdwxact_id = generate_fdw_xact_identifier(GetTopTransactionId(),
+												  serverid, userid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdw_xact_id = fdwxact_id;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdw_xact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->twophase_commit_enabled = true; /* by default, will be changed at pre-commit phase */
+	fdw_part->fdw_state = NULL;
+	fdw_part->prepare_foreign_xact = fdw_routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact = fdw_routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact = fdw_routine->RollbackForeignTransaction;
+	fdw_part->is_twophase_commit_enabled = fdw_routine->IsTwoPhaseCommitEnabled;
+
+	/* Add this foreign transaction to the participants list */
+	FdwXactAtomicCommitParticipants = lappend(FdwXactAtomicCommitParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+
+	return;
+}
+
+/*
+ * FdwXactShmemSize
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdw_xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * FdwXactShmemInit
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdw_xacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->freeFdwXacts = NULL;
+		FdwXactCtl->numFdwXacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdw_xacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdw_xacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdw_xacts[cnt].status = FDW_XACT_INITIAL;
+			fdw_xacts[cnt].fxact_free_next = FdwXactCtl->freeFdwXacts;
+			FdwXactCtl->freeFdwXacts = &fdw_xacts[cnt];
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * PreCommit_FdwXacts
+ *
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_atomic_commit;
+	ListCell	*lc;
+	ListCell	*next;
+	ListCell	*prev = NULL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsAtomicCommitRequested())
+		return;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	need_atomic_commit = FdwXactAtomicCommitRequired();
+
+	/*
+	 * If 'require' case, we require all modified server have to be capable of
+	 * two-phase commit protocol.
+	 */
+	if (need_atomic_commit &&
+		foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Commit transactions on foreign servers.
+	 *
+	 * Committed transactions are removed from FdwXactAtomicCommitParticipants
+	 * so that the later preparation can process only servers that requires to be commit
+	 * using two-phase commit protocol.
+	 */
+	for (lc = list_head(FdwXactAtomicCommitParticipants); lc != NULL; lc = next)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool can_commit = false;
+
+		next = lnext(lc);
+
+		if (!need_atomic_commit || !fdw_part->modified)
+		{
+			/*
+			 * We can commit not-modified servers and when the atomic commit is not
+			 * required.
+			 */
+			can_commit = true;
+		}
+		else if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+				 !fdw_part->twophase_commit_enabled)
+		{
+			/* Also in 'prefer' case, non-2pc-capable servers can be committed */
+			can_commit = true;
+		}
+
+		if (can_commit)
+		{
+			/* Commit the foreign transaction */
+			FdwXactCommitForeignTransaction(fdw_part);
+
+			/* Delete it from the participant list */
+			FdwXactAtomicCommitParticipants =
+				list_delete_cell(FdwXactAtomicCommitParticipants, lc, prev);
+		}
+
+		prev = lc;
+	}
+
+	/*
+	 * If only one participant of all participants is modified, we can commit it.
+	 * This can avoid to use two-phase commit for only one server in the 'prefer' case
+	 * where the transaction has one 2pc-capable modified server and some modified
+	 * servers.
+	 */
+	if (list_length(FdwXactAtomicCommitParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
+		FdwXactCommitForeignTransaction(linitial(FdwXactAtomicCommitParticipants));
+		list_free(FdwXactAtomicCommitParticipants);
+		return;
+	}
+
+	FdwXactPrepareForeignTransactions();
+	/* keep FdwXactparticipantsForAC until the end of transaction */
+}
+
+/*
+ * FdwXactPrepareForeignTransactions
+ *
+ * Prepare all foreign transaction participants.  This function creates a prepared
+ * participants chain each time when we prepared a foreign transaction. The prepared
+ * participants chain is used to access all participants of distributed transaction
+ * quickly. If any one of them fails to prepare, we change over aborts.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	FdwXactState *state;
+	ListCell   *lcell;
+	FdwXact		prev_fdwxact = NULL;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	state = create_fdw_xact_state();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXact		fdwxact;
+
+		/*
+		 * Insert the foreign transaction entry. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(GetTopTransactionId(), fdw_part);
+
+		state->serverid = fdw_part->server->serverid;
+		state->userid = fdw_part->usermapping->userid;
+		state->umid = fdw_part->usermapping->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		/*
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal). During abort processing,
+		 * we might try to resolve a never-prepared transaction, and get an error.
+		 * This is fine as long as the FDW provides us unique prepared transaction
+		 * identifiers.
+		 */
+		if (!fdw_part->prepare_foreign_xact(state))
+		{
+			/* Failed to prepare, change over aborts */
+			ereport(ERROR,
+					(errmsg("could not prepare transaction on foreign server %s",
+							fdw_part->server->servername)));
+		}
+
+		/* Keep fdw_state until end of transaction */
+		fdw_part->fdw_state = state->fdw_state;
+
+		/* Preparation is success, update its status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdw_part->fdw_xact->status = FDW_XACT_PREPARED;
+		fdw_part->fdw_xact = fdwxact;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Create a prepared participants chain, which is link-ed FdwXact entries
+		 * involving with this transaction.
+		 */
+		if (prev_fdwxact)
+		{
+			/* Append others to the tail */
+			Assert(fdwxact->fxact_next == NULL);
+			prev_fdwxact->fxact_next = fdwxact;
+		}
+	}
+}
+
+/*
+ * Commit the given foreign transaction.
+ */
+void
+FdwXactCommitForeignTransaction(FdwXactParticipant *fdw_part)
+{
+	FdwXactState *state;
+
+	state = create_fdw_xact_state();
+	state->serverid = fdw_part->server->serverid;
+	state->userid = fdw_part->usermapping->userid;
+	state->umid = fdw_part->usermapping->umid;
+	fdw_part->fdw_state = (void *) state;
+
+	if (!fdw_part->commit_foreign_xact(state))
+		ereport(ERROR,
+				(errmsg("could not commit foreign transaction on server %s",
+						fdw_part->server->servername)));
+}
+
+/*
+ * FdwXactInsertFdwXactEntry
+ *
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and will
+ * be persisted to the disk under pg_fdw_xact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fxact;
+	FdwXactOnDiskData	*fxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact = insert_fdw_xact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdw_xact_id);
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->held_by = MyBackendId;
+	fdw_part->fdw_xact = fxact;
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdw_xact_id);
+	data_len = data_len + strlen(fdw_part->fdw_xact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fxact_file_data->dbid = MyDatabaseId;
+	fxact_file_data->local_xid = xid;
+	fxact_file_data->serverid = fdw_part->server->serverid;
+	fxact_file_data->userid = fdw_part->usermapping->userid;
+	fxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fxact_file_data->fdw_xact_id, fdw_part->fdw_xact_id,
+		   strlen(fdw_part->fdw_xact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fxact_file_data, data_len);
+	fxact->insert_end_lsn = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_INSERT);
+	XLogFlush(fxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact->valid = true;
+	LWLockRelease(FdwXactLock);
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fxact_file_data);
+	return fxact;
+}
+
+/*
+ * insert_fdw_xact
+ *
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdw_xact_id)
+{
+	int i;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		fxact = FdwXactCtl->fdw_xacts[i];
+		if (fxact->dbid == dbid &&
+			fxact->local_xid == xid &&
+			fxact->serverid == serverid &&
+			fxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->freeFdwXacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fxact = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fxact->fxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->numFdwXacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts++] = fxact;
+
+	fxact->held_by = InvalidBackendId;
+	fxact->dbid = dbid;
+	fxact->local_xid = xid;
+	fxact->serverid = serverid;
+	fxact->userid = userid;
+	fxact->umid = umid;
+	fxact->insert_start_lsn = InvalidXLogRecPtr;
+	fxact->insert_end_lsn = InvalidXLogRecPtr;
+	fxact->valid = false;
+	fxact->ondisk = false;
+	fxact->inredo = false;
+	memcpy(fxact->fdw_xact_id, fdw_xact_id, strlen(fdw_xact_id) + 1);
+
+	return fxact;
+}
+
+/*
+ * remove_fdw_xact
+ *
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdw_xact(FdwXact fdw_xact)
+{
+	int			cnt;
+
+	Assert(fdw_xact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		if (FdwXactCtl->fdw_xacts[cnt] == fdw_xact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (cnt >= FdwXactCtl->numFdwXacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdw_xact->local_xid, fdw_xact->serverid, fdw_xact->userid)));
+
+	/* Remove the entry from active array */
+	FdwXactCtl->numFdwXacts--;
+	FdwXactCtl->fdw_xacts[cnt] = FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts];
+
+	/* Put it back into free list */
+	fdw_xact->fxact_free_next = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fdw_xact;
+
+	/* Reset informations */
+	fdw_xact->status = FDW_XACT_INITIAL;
+	fdw_xact->held_by = InvalidBackendId;
+	fdw_xact->fxact_next = NULL;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdw_xact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdw_xact->serverid;
+		record.dbid = fdw_xact->dbid;
+		record.xid = fdw_xact->local_xid;
+		record.userid = fdw_xact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdw_xact_remove));
+		recptr = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if we require atomic commit.
+ * It is required if the transaction modified data on two or more servers including
+ * local node itself. This function also checks for each server if two-phase commit
+ * is enabled or not.
+ */
+static bool
+FdwXactAtomicCommitRequired(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsAtomicCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		/* Check if the foreign server is capable of two-phase commit protocol */
+		if (fdw_part->is_twophase_commit_enabled(fdw_part->server->serverid))
+			fdw_part->twophase_commit_enabled = true;
+		else if (fdw_part->modified)
+			MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/* Atomic commit is required if we modified data on two or more participants */
+	if (nserverswritten <= 1)
+		return false;
+
+	FdwXactAtomicCommitReady = true;
+	return true;
+}
+
+bool
+FdwXactIsAtomicCommitReady(void)
+{
+	return FdwXactAtomicCommitReady;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * ForgetAllFdwXactParticipants
+ *
+ * Reset all the foreign transaction entries that this backend registered.
+ * If the foreign transaction has the corresponding FdwXact entry, resetting
+ * the held_by field means to leave that entry in unresolved state. If we
+ * leaves any entries, we update the oldest xmin of unresolved transaction
+ * so that transaction status of dangling transaction are not truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(cell, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+
+		/* Skip if didn't register FdwXact entry yet */
+		if (fdw_part->fdw_xact == NULL)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in
+		 * FdwXactAtomicCommitParticipants could be used by other backend before we
+		 * forget in case where the resolver process removes the FdwXact entry
+		 * and other backend reuses it before we forget. So we need to check
+		 * if the entries are still associated with the transaction.
+		 */
+		if (fdw_part->fdw_xact->held_by == MyBackendId)
+		{
+			fdw_part->fdw_xact->held_by = InvalidBackendId;
+			n_lefts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Update the oldest local transaction of unresolved distributed
+	 * transaction if we leaved any FdwXact entries.
+	 */
+	if (n_lefts > 0)
+		FdwXactComputeRequiredXmin();
+
+	FdwXactAtomicCommitParticipants = NIL;
+}
+
+/*
+ * AtProcExit_FdwXact
+ *
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDW_XACT_NOT_WAITING and then change
+ * that state to FDW_XACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDW_XACT_WAIT_COMPLETE once foreign transactions are resolved.
+ * This backend then resets its state to FDW_XACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue and changes the state to FDW_XACT_WAITING_RETRY.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+	ListCell	*lc;
+	List		*fdwxact_participants = NIL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsAtomicCommitRequested())
+		return;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDW_XACT_NOT_WAITING);
+
+	if (FdwXactAtomicCommitParticipants != NIL)
+	{
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * we've prepared just before, use the participants list.
+		 */
+		Assert(MyPgXact->xid == wait_xid);
+		fdwxact_participants = FdwXactAtomicCommitParticipants;
+	}
+	else
+	{
+		/*
+		 * Get participants list from the global array. This is required (1)
+		 * when we're waiting for foreign transactions to be resolved that
+		 * is part of a local prepared transaction that is marked as prepared
+		 * during running, or (2) when we resolve the PREPARE'd distributed
+		 * transaction after restart.
+		 */
+		fdwxact_participants = get_fdw_xacts(MyDatabaseId, wait_xid,
+											 InvalidOid, InvalidOid, true);
+	}
+
+	/* Exit if we found no foreign transaction to resolve */
+	if (fdwxact_participants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(lc, fdwxact_participants)
+	{
+		FdwXact fdw_xact = (FdwXact) lfirst(lc);
+
+		/* Don't overwrite status if fate has been determined */
+		if (fdw_xact->status == FDW_XACT_PREPARED)
+			fdw_xact->status = (is_commit ?
+								FDW_XACT_COMMITTING_PREPARED :
+								FDW_XACT_ABORTING_PREPARED);
+	}
+
+	/* Set backend status and enqueue itself to the active queue*/
+	MyProc->fdwXactState = FDW_XACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	FdwXactQueueInsert();
+	LWLockRelease(FdwXactLock);
+
+	/* Launch a resolver process if not yet, or wake it up */
+	fdwxact_maybe_launch_resolver(false);
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDW_XACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDW_XACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDW_XACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+
+	/*
+	 * Forget the list of locked entries, also means that the entries
+	 * that could not resolved are remained as dangling transactions.
+	 */
+	ForgetAllFdwXactParticipants();
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Acquire FdwXactLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Insert MyProc into the tail of FdwXactActiveQueue.
+ */
+static void
+FdwXactQueueInsert(void)
+{
+	SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactActiveQueue),
+						 &(MyProc->fdwXactLinks));
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Resolve one distributed transaction. The target distributed transaction
+ * is fetched from either the active queue or the retry queue and its participants
+ * are fetched from either the global array.
+ *
+ * Release the waiter and return true if we resolved the all of the foreign
+ * transaction participants. On failure, we move the FdwXactLinks entry to the
+ * retry queue from the active queue, and raise an error and exit.
+ */
+bool
+FdwXactResolveDistributedTransaction(Oid dbid, bool is_active)
+{
+	FdwXactState	*state;
+	ListCell		*lc;
+	ListCell		*next;
+	PGPROC			*waiter = NULL;
+	List			*participants;
+	SHM_QUEUE		*target_queue;
+
+	if (is_active)
+		target_queue = &(FdwXactRslvCtl->FdwXactActiveQueue);
+	else
+		target_queue = &(FdwXactRslvCtl->FdwXactRetryQueue);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Fetch a waiter from beginning of the queue */
+	while ((waiter = (PGPROC *) SHMQueueNext(target_queue, target_queue,
+											 offsetof(PGPROC, fdwXactLinks))) != NULL)
+	{
+		/* Found a waiter */
+		if (waiter->databaseId == dbid)
+			break;
+	}
+
+	/* If no waiter, there is no job */
+	if (!waiter)
+	{
+		LWLockRelease(FdwXactLock);
+		return false;
+	}
+
+	Assert(TransactionIdIsValid(waiter->fdwXactWaitXid));
+
+	state = create_fdw_xact_state();
+	participants = get_fdw_xacts(dbid, waiter->fdwXactWaitXid, InvalidOid,
+								 InvalidOid, false);
+	LWLockRelease(FdwXactLock);
+
+	/* Resolve all foreign transactions one by one */
+	for (lc = list_head(participants); lc != NULL; lc = next)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(lc);
+
+		CHECK_FOR_INTERRUPTS();
+
+		next = lnext(lc);
+
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		PG_TRY();
+		{
+			FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+		}
+		PG_CATCH();
+		{
+			/* Re-insert the waiter to the retry queue */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDW_XACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactRetryQueue),
+									 &(waiter->fdwXactLinks));
+				waiter->fdwXactState = FDW_XACT_WAITING_RETRY;
+			}
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved a foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+	}
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDW_XACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc xid %u", wait_xid);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	return true;
+}
+
+/*
+ * Resolve all dangling foreign transactions on the given database. Get
+ * all dangling foreign transactions from shmem global array and resolve
+ * them one by one.
+ */
+void
+FdwXactResolveAllDanglingTransactions(Oid dbid)
+{
+	List		*dangling_fdwxacts = NIL;
+	ListCell	*cell;
+	bool		n_resolved = 0;
+	int			i;
+
+	Assert(OidIsValid(dbid));
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/*
+	 * Walk over the global array to make the list of dangling transactions
+	 * of which corresponding local transaction is on the given database.
+	 */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fxact = FdwXactCtl->fdw_xacts[i];
+
+		/*
+		 * Append the fdwxact entry on the given database to the list if
+		 * it's handled by nobody and the corresponding local transaction
+		 * is not part of the prepared transaction.
+		 */
+		if (fxact->dbid == dbid &&
+			fxact->held_by == InvalidBackendId &&
+			!TwoPhaseExists(fxact->local_xid))
+			dangling_fdwxacts = lappend(dangling_fdwxacts, fxact);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/* Return if there is no foreign transaction we need to resolve */
+	if (dangling_fdwxacts == NIL)
+		return;
+
+	foreach(cell, dangling_fdwxacts)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(cell);
+		FdwXactState *state;
+
+		state = create_fdw_xact_state();
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+
+		n_resolved++;
+	}
+
+	list_free(dangling_fdwxacts);
+
+	elog(DEBUG2, "resolved %d dangling foreign xacts", n_resolved);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ *
+ * In commit case, we have already prepared transactions on the foreign
+ * servers during pre-commit. And that prepared transactions will be
+ * resolved by the resolver process. So we don't do anything about the
+ * foreign transaction.
+ *
+ * In abort case, user requested rollback or we changed over rollback
+ * due to error during commit. To close current foreign transaction anyway
+ * we call rollback API to every foreign transaction. If we raised an error
+ * during preparing and came to here, it's possible that some entries of
+ * FdwXactParticipants already registered its FdwXact entry. If there is
+ * we leave them as dangling transaction and ask the resolver process to
+ * process them.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		int left_fdwxacts = 0;
+		FdwXactState *state = create_fdw_xact_state();
+
+		foreach (lcell, FdwXactAtomicCommitParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * Count FdwXact entries that we registered to shared memory array
+			 * in this transaction.
+			 */
+			if (fdw_part->fdw_xact)
+			{
+				/*
+				 * The status of foreign transaction must be either preparing
+				 * or prepared. In any case, since we have registered FdwXact
+				 * entry we leave them to the resolver process. For the preparing
+				 * state, since the foreign transaction might not close yet we
+				 * fall through and call rollback API. For the prepared state,
+				 * since the foreign transaction has closed we don't need to do
+				 * anything.
+				 */
+				Assert(fdw_part->fdw_xact->status == FDW_XACT_PREPARING ||
+					   fdw_part->fdw_xact->status == FDW_XACT_PREPARED);
+
+				left_fdwxacts++;
+				if (fdw_part->fdw_xact->status == FDW_XACT_PREPARED)
+					continue;
+			}
+
+			state->serverid = fdw_part->server->serverid;
+			state->userid = fdw_part->usermapping->userid;
+			state->umid = fdw_part->usermapping->umid;
+			state->fdw_state = fdw_part->fdw_state;
+
+			/*
+			 * Rollback all current foreign transaction. Since we're rollbacking
+			 * the transaction it's too late even if we raise an error here.
+			 * So we log it as warning.
+			 */
+			if (!fdw_part->rollback_foreign_xact(state))
+				ereport(WARNING,
+						(errmsg("could not abort transaction on server \"%s\"",
+								fdw_part->server->servername)));
+		}
+
+		/* If we left some FdwXact entries, ask the resolver process */
+		if (left_fdwxacts > 0)
+		{
+			ereport(WARNING,
+					(errmsg("might have left %u foreign transactions in in-doubt status",
+							left_fdwxacts)));
+			fdwxact_maybe_launch_resolver(true);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+	FdwXactAtomicCommitReady = false;
+}
+
+/*
+ * AtPrepare_FdwXacts
+ *
+ * If there are foreign servers involved in the transaction, this function
+ * prepares transactions on those servers.
+ *
+ * Note that it can happen that the transaction aborts after we prepared part
+ * of participants. In this case since we can change to abort we cannot forget
+ * FdwXactAtomicCommitParticipants here. These are processed by the resolver process
+ * during aborting, or at EOXact_FdwXacts.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (!IsAtomicCommitEnabled())
+		return;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (FdwXactAtomicCommitRequired() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION),
+				 errmsg("can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * FdwXactResolveForeignTransaction
+ *
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * handler routine. The foreign transaction can be a dangling transaction
+ * that is not interested by nobody. If the fate of foreign transaction is
+ * not determined yet, it'sdetermined according to the status of corresponding
+ * local transaction.
+ *
+ * If the resolution is successful, remove the foreign transaction entry from
+ * the shared memory and also remove the corresponding on-disk file.
+ */
+static bool
+FdwXactResolveForeignTransaction(FdwXactState *state, FdwXact fdwxact,
+								 int elevel)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool		is_commit;
+	bool		ret;
+
+	Assert(fdwxact);
+
+	/*
+	 * Determine whether we commit or abort this foreign transaction.
+	 */
+	if (fdwxact->status == FDW_XACT_COMMITTING_PREPARED)
+		is_commit = true;
+	else if (fdwxact->status == FDW_XACT_ABORTING_PREPARED)
+		is_commit = false;
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	else if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_COMMITTING_PREPARED;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_ABORTING_PREPARED;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		is_commit = false;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction
+	 * state is neither committing or aborting. This should not
+	 * happen because we cannot determine to do commit or abort for
+	 * foreign transaction associated with the in-progress local
+	 * transaction.
+	 */
+	else
+		ereport(ERROR,
+				(errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid)));
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Resolve the foreign transaction */
+	Assert(fdw_routine->ResolveForeignTransaction);
+
+	ret = fdw_routine->ResolveForeignTransaction(state, is_commit);
+
+	if (!ret)
+	{
+		ereport(elevel,
+				(errmsg("could not %s a prepared foreign transaction on server \"%s\"",
+						is_commit ? "commit" : "rollback", server->servername),
+				 errdetail("local transaction id is %u, connected by user id %u",
+						   fdwxact->local_xid, fdwxact->userid)));
+	}
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdw_xact(fdwxact);
+	LWLockRelease(FdwXactLock);
+
+	return ret;
+}
+
+static FdwXactState *
+create_fdw_xact_state(void)
+{
+	FdwXactState *state;
+
+	state = palloc(sizeof(FdwXactState));
+	state->serverid = InvalidOid;
+	state->userid = InvalidOid;
+	state->umid = InvalidOid;
+	state->fdwxact_id = NULL;
+	state->fdw_state = NULL;
+
+	return state;
+}
+
+/*
+ * Return one FdwXact entry that matches to given arguments, otherwise
+ * return NULL. Since this function search FdwXact entry by unique key
+ * all arguments should be valid.
+ */
+static FdwXact
+get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				 bool need_lock)
+{
+	List	*fdw_xact_list;
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, need_lock);
+
+	/* Could not find entry */
+	if (fdw_xact_list == NIL)
+		return NULL;
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdw_xact_list) == 1);
+
+	return (FdwXact) linitial(fdw_xact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdw_xact_exists(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdw_xact_list;
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, true);
+
+	return fdw_xact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_prepared_fdw_xacts.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdw_xacts(int *length)
+{
+	List		*all_fdw_xacts;
+	ListCell	*lc;
+	FdwXact		fdw_xacts;
+	int			num_fdw_xacts = 0;
+
+	Assert(length != NULL);
+
+	/* Get all entries */
+	all_fdw_xacts = get_fdw_xacts(InvalidOid, InvalidTransactionId,
+								  InvalidOid, InvalidOid, true);
+
+	if (all_fdw_xacts == NIL)
+	{
+		*length = 0;
+		return NULL;
+	}
+
+	fdw_xacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdw_xacts));
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdw_xacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdw_xacts + num_fdw_xacts, fx,
+			   sizeof(FdwXactData));
+		num_fdw_xacts++;
+	}
+
+	*length = num_fdw_xacts;
+	list_free(all_fdw_xacts);
+
+	return fdw_xacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return
+ * NIL.
+ */
+static List*
+get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			  bool need_lock)
+{
+	int i;
+	List	*fdw_xact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact	fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool	matches = true;
+
+		/* xid */
+		if (xid != InvalidTransactionId && xid != fdw_xact->local_xid)
+			matches = false;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdw_xact->dbid != dbid)
+			matches = false;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdw_xact->serverid)
+			matches = false;
+
+		/* userid */
+		if (OidIsValid(userid) && fdw_xact->userid != userid)
+			matches = false;
+
+		/* Append it if matched */
+		if (matches)
+			fdw_xact_list = lappend(fdw_xact_list, fdw_xact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdw_xact_list;
+}
+
+/*
+ * fdw_xact_redo
+ * Apply the redo log for a foreign transaction.
+ */
+void
+fdw_xact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDW_XACT_REMOVE)
+	{
+		xl_fdw_xact_remove *record = (xl_fdw_xact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier with in the form
+ * of "fx_<random number>_<xid>_<serverid>_<userid> whose length is always
+ * less than NAMEDATALEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+generate_fdw_xact_identifier(TransactionId xid, Oid serverid, Oid userid)
+{
+	char*	fdw_xact_id;
+
+	fdw_xact_id = (char *)palloc0(FDW_XACT_ID_MAX_LEN * sizeof(char));
+
+	snprintf(fdw_xact_id, FDW_XACT_ID_MAX_LEN, "%s_%ld_%u_%d_%d",
+			 "fx", Abs(random()), xid, serverid, userid);
+	fdw_xact_id[strlen(fdw_xact_id)] = '\0';
+
+	return fdw_xact_id;
+}
+
+/*
+ * CheckPointFdwXact
+ *
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * In order to avoid disk I/O while holding a light weight lock, the function
+ * first collects the files which need to be synced under FdwXactLock and then
+ * syncs them after releasing the lock. This approach creates a race condition:
+ * after releasing the lock, and before syncing a file, the corresponding
+ * foreign transaction entry and hence the file might get removed. The function
+ * checks whether that's true and ignores the error if so.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdw_xacts = 0;
+
+	/* Quick get-away, before taking lock */
+	if (max_prepared_foreign_xacts <= 0)
+		return;
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Another quick, before we allocate memory */
+	if (FdwXactCtl->numFdwXacts <= 0)
+	{
+		LWLockRelease(FdwXactLock);
+		return;
+	}
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		FdwXact		fxact = FdwXactCtl->fdw_xacts[cnt];
+
+		if ((fxact->valid || fxact->inredo) &&
+			!fxact->ondisk &&
+			fxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fxact->dbid, fxact->local_xid,
+								fxact->serverid, fxact->userid,
+								buf, len);
+			fxact->ondisk = true;
+			fxact->insert_start_lsn = InvalidXLogRecPtr;
+			fxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdw_xacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDW_XACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdw_xacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdw_xacts,
+							 serialized_fdw_xacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDW_XACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDW_XACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * ProcessFdwXactBuffer
+ *
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid = ShmemVariableCache->nextXid;
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(insert_start_lsn != InvalidXLogRecPtr);
+
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid, true);
+		if (buf == NULL)
+		{
+			ereport(WARNING,
+					(errmsg("removing corrupt fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+			return NULL;
+		}
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return thecontents in
+ * a structure allocated in-memory. Otherwise return NULL. The structure can
+ * be later freed by the caller.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				bool give_warnings)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+	{
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not stat FDW transaction state file \"%s\": %m",
+							path)));
+		return NULL;
+	}
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdw_xact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+	{
+		CloseTransientFile(fd);
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+		return NULL;
+	}
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+	{
+		CloseTransientFile(fd);
+		return NULL;
+	}
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_READ);
+	if (read(fd, buf, stat.st_size) != stat.st_size)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read FDW transaction state file \"%s\": %m",
+					  path)));
+		return NULL;
+	}
+
+	pgstat_report_wait_end();
+	CloseTransientFile(fd);
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+	{
+		pfree(buf);
+		return NULL;
+	}
+
+	/* Check if the contents is an expected data */
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fxact_file_data->dbid  != dbid ||
+		fxact_file_data->serverid != serverid ||
+		fxact_file_data->userid != userid ||
+		fxact_file_data->local_xid != xid)
+	{
+		ereport(WARNING,
+			(errmsg("invalid foreign transaction state file \"%s\"",
+					path)));
+		CloseTransientFile(fd);
+		pfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/*
+ * PrescanFdwXacts
+ *
+ * Scan the all foreign transactions directory for oldest active transaction.
+ * This is run during database startup, after we completed reading WAL.
+ * ShmemVariableCache->nextXid has been set to one more than the highest XID
+ * for which evidence exists in WAL.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	TransactionId nextXid = ShmemVariableCache->nextXid;
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+		 strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			TransactionId local_xid;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/*
+			 * Remove a foreign prepared transaction file corresponding to an
+			 * XID, which is too new.
+			 */
+			if (TransactionIdFollowsOrEquals(local_xid, nextXid))
+			{
+				ereport(WARNING,
+						(errmsg("removing future foreign prepared transaction file \"%s\"",
+								clde->d_name)));
+				RemoveFdwXactFile(dbid, local_xid, serverid, userid, true);
+				continue;
+			}
+
+			if (TransactionIdPrecedesOrEquals(local_xid, oldestActiveXid))
+				oldestActiveXid = local_xid;
+		}
+	}
+
+	FreeDir(cldir);
+	return oldestActiveXid;
+}
+
+/*
+ * restoreFdwXactData
+ *
+ * Scan pg_fdw_xact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * FdwXactRedoAdd
+ *
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fxact = insert_fdw_xact(fxact_data->dbid, fxact_data->local_xid,
+							fxact_data->serverid, fxact_data->userid,
+							fxact_data->umid, fxact_data->fdw_xact_id);
+
+	/*
+	 * Set status as preparing, since we do not know the xact status
+	 * right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->insert_start_lsn = start_lsn;
+	fxact->insert_end_lsn = end_lsn;
+	fxact->inredo = true;	/* added in redo */
+	fxact->valid = false;
+	fxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * FdwXactRedoRemove
+ *
+ * Remove the corresponding fdw_xact entry from FdwXactCtl.
+ * Also remove fdw_xact file if a foreign transaction was saved
+ * via an earlier checkpoint.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdw_xact(dbid, xid, serverid, userid,
+							   false);
+
+	if (fdwxact == NULL)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdw_xact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+		/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdw_xacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_prepared_fdw_xacts(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdw_xacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdw_xacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(6, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdw_xacts = get_all_fdw_xacts(&num_fdw_xacts);
+		status->num_xacts = num_fdw_xacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdw_xact = &status->fdw_xacts[status->cur_xact++];
+		Datum		values[6];
+		bool		nulls[6];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdw_xact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdw_xact->dbid);
+		values[1] = TransactionIdGetDatum(fdw_xact->local_xid);
+		values[2] = ObjectIdGetDatum(fdw_xact->serverid);
+		values[3] = ObjectIdGetDatum(fdw_xact->userid);
+		switch (fdw_xact->status)
+		{
+			case FDW_XACT_PREPARING:
+				xact_status = "prepared";
+				break;
+			case FDW_XACT_COMMITTING_PREPARED:
+				xact_status = "committing";
+				break;
+			case FDW_XACT_ABORTING_PREPARED:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		/* should this be really interpreted by FDW */
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdw_xact->fdw_xact_id,
+															 strlen(fdw_xact->fdw_xact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXactState *state;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	bool			ret;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, true);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	usermapping = GetUserMapping(userid, serverid);
+
+	state = create_fdw_xact_state();
+	state->serverid = serverid;
+	state->userid = userid;
+	state->umid = usermapping->umid;
+
+	ret = FdwXactResolveForeignTransaction(state, fdwxact, LOG);
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, false);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	remove_fdw_xact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/access/fdwxact/fdwxact_launcher.c b/src/backend/access/fdwxact/fdwxact_launcher.c
new file mode 100644
index 0000000..39f351b
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_launcher.c
@@ -0,0 +1,641 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * There is a shared memory area where the information of resolver process
+ * is stored. Requesting of starting new resolver process by backend process
+ * is done via that shared memory area. Note that the launcher is assuming
+ * that there is no more than one starting request for a database.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launcher_sigusr2(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid, int slot);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+Datum pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS);
+
+/*
+ * Wake up the launcher process to retry launch. This is used by
+ * the resolver process is being stopped.
+ */
+void
+FdwXactLauncherWakeupToRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request resolution. This is
+ * used by the backend process.
+ */
+void
+FdwXactLauncherWakeupToRequest(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactActiveQueue));
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactRetryQueue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR1: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always try to start by the backend request.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			ResetLatch(MyLatch);
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher launch",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver worker
+ * if not running yet. A foreign transaction resolver worker is responsible
+ * for resolution of foreign transaction that are registered on a database.
+ * So if a resolver worker already is launched, we don't need to launch new
+ * one.
+ */
+void
+fdwxact_maybe_launch_resolver(bool ignore_error)
+{
+	FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->pid != InvalidPid &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * If we found the resolver for my database, we don't need to launch new
+	 * one but wake running worker up.
+	 */
+	if (found)
+	{
+		SetLatch(resolver->latch);
+
+		elog(DEBUG1, "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		return;
+	}
+
+	/* Looking for unused resolver slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	/*
+	 * However if there are no more free worker slots, inform user about it before
+	 * exiting.
+	 */
+	if (!found)
+	{
+		LWLockRelease(FdwXactResolverLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+		return;
+	}
+
+	Assert(resolver->pid == InvalidPid);
+
+	/* Found a new resolver process */
+	resolver->dbid = MyDatabaseId;
+	resolver->in_use = true;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Wake up launcher */
+	FdwXactLauncherWakeupToRequest();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' at 'slot' if given. If slot is negative value we find an unused slot.
+ * Note that caller must hold FdwXactResolverLock in exclusive mode.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid, int slot)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int launch_slot = slot;
+
+	/* If slot number is invalid, we find an unused slot */
+	if (launch_slot < 0)
+	{
+		int i;
+
+		for (i = 0; i < max_foreign_xact_resolvers; i++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+			if (resolver->in_use && resolver->dbid == dbid)
+				return;
+
+			if (!resolver->in_use)
+			{
+				launch_slot = i;
+				break;
+			}
+		}
+	}
+
+	/* No unused found */
+	if (launch_slot < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[launch_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_main_arg = Int32GetDatum(launch_slot);
+	bgw.bgw_notify_pid = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch all foreign transaction resolvers that are required by backend process
+ * but not running. Return true if we launch any resolver.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	int i, j;
+	int num_launches = 0;
+	int num_unused_slots = 0;
+	int num_dbs = 0;
+	bool launched = false;
+	Oid	*dbs_to_launch;
+	Oid	*dbs_having_worker = palloc0(sizeof(Oid) * max_foreign_xact_resolvers);
+
+	/*
+	 * Launch resolver workers on the databases that are requested
+	 * by backend processes while looking unused slots.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* Remember unused worker slots */
+		if (!resolver->in_use)
+		{
+			num_unused_slots++;
+			continue;
+		}
+
+		/* Remember databases that are having a resolve worker, fall through */
+		if (OidIsValid(resolver->dbid))
+			dbs_having_worker[num_dbs++] = resolver->dbid;
+
+		/* Launch the backend-requested worker */
+		if (resolver->in_use &&
+			OidIsValid(resolver->dbid) &&
+			resolver->pid == InvalidPid)
+		{
+			fdwxact_launch_resolver(resolver->dbid, i);
+			launched = true;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* quick exit if no unused slot */
+	if (num_unused_slots == 0)
+		return launched;
+
+	/*
+	 * Launch the stopped resolver on the database that has unresolved
+	 * foreign transaction but doesn't have any resolver. Scanning
+	 * all FdwXact entries could take time but it's harmless for the
+	 * relaunch case.
+	 */
+	dbs_to_launch = (Oid *) palloc(sizeof(Oid) * num_unused_slots);
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool found = false;
+
+		/* unused slot is full */
+		if (num_launches > num_unused_slots)
+			break;
+
+		for (j = 0; j < num_dbs; j++)
+		{
+			if (dbs_having_worker[j] == fdw_xact->dbid)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		/* Register the database if any resolvers aren't working on that */
+		if (!found)
+			dbs_to_launch[num_launches++] = fdw_xact->dbid;
+	}
+
+	/* Launch resolver process for a database at any worker slot */
+	for (i = 0; i < num_launches; i++)
+	{
+		fdwxact_launch_resolver(dbs_to_launch[i], -1);
+		launched = true;
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+
+/*
+ * Returns activity of foreign transaction resolvers, including pids, the number
+ * of tasks and the last resolution time.
+ */
+Datum
+pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/fdwxact_resolver.c b/src/backend/access/fdwxact/fdwxact_resolver.c
new file mode 100644
index 0000000..0b754da
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_resolver.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for every databases.
+ *
+ * A resolver process continues to resolve foreign transactions on a database
+ * It resolves two types of foreign transactions: on-line foreign transaction
+ * and dangling foreign transaction. The on-line foreign transaction is a
+ * foreign transaction that a concurrent backend process is waiting for
+ * resolution. The dangling transaction is a foreign transaction that corresponding
+ * distributed transaction ended up in in-doubt state. A resolver process
+ * doesn' exit as long as there is at least one unresolved foreign transaction
+ * on the database even if the timeout has come.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+
+//static MemoryContext ResolveContext = NULL;
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FdwXactRslvLoop(void);
+static long FdwXactRslvComputeSleepTime(TimestampTz now);
+static void FdwXactRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+	FdwXactLauncherWakeupToRetry();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	TIMESTAMP_NOBEGIN(MyFdwXactResolver->last_resolved_time);
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactRslvLoop(void)
+{
+	TimestampTz last_retry_time = 0;
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time;
+		bool		resolved;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve one distributed transaction */
+		StartTransactionCommand();
+		resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, true);
+		CommitTransactionCommand();
+
+		now = GetCurrentTimestamp();
+
+		/* Update my state */
+		if (resolved)
+			MyFdwXactResolver->last_resolved_time = now;
+
+		if (TimestampDifferenceExceeds(last_retry_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			StartTransactionCommand();
+			resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, false);
+			CommitTransactionCommand();
+
+			last_retry_time = GetCurrentTimestamp();
+
+			/* Update my state */
+			if (resolved)
+				MyFdwXactResolver->last_resolved_time = last_retry_time;
+		}
+
+		/* Check for fdwxact resolver timeout */
+		FdwXactRslvCheckTimeout(now);
+
+		/*
+		 * If we have resolved any distributed transaction we go the next
+		 * without both resolving dangling transaction and sleeping because
+		 * there might be other on-line transactions waiting to be resolved.
+		 */
+		if (!resolved)
+		{
+			/* Resolve dangling transactions as mush as possible */
+			StartTransactionCommand();
+			FdwXactResolveAllDanglingTransactions(MyDatabaseId);
+			CommitTransactionCommand();
+
+			sleep_time = FdwXactRslvComputeSleepTime(now);
+
+			MemoryContextResetAndDeleteChildren(resolver_ctx);
+			MemoryContextSwitchTo(TopMemoryContext);
+
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   sleep_time,
+						   WAIT_EVENT_FDW_XACT_RESOLVER_MAIN);
+
+			if (rc & WL_POSTMASTER_DEATH)
+				proc_exit(1);
+		}
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/*
+	 * Reached to the timeout. We exit if there is no more both pending on-line
+	 * transactions and dangling transactions.
+	 */
+	if (!fdw_xact_exists(InvalidTransactionId, MyDatabaseId, InvalidOid,
+						 InvalidOid))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyFdwXactResolver->dbid))));
+		CommitTransactionCommand();
+
+		fdwxact_resolver_detach();
+		proc_exit(0);
+	}
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. Return the sleep time
+ * in milliseconds, -1 means that we reached to the timeout and should exits
+ */
+static long
+FdwXactRslvComputeSleepTime(TimestampTz now)
+{
+	static TimestampTz	wakeuptime = 0;
+	long	sleeptime;
+	long	sec_to_timeout;
+	int		microsec_to_timeout;
+
+	if (now >= wakeuptime)
+		wakeuptime = TimestampTzPlusMilliseconds(now,
+												 foreign_xact_resolution_retry_interval);
+
+	/* Compute relative time until wakeup. */
+	TimestampDifference(now, wakeuptime,
+						&sec_to_timeout, &microsec_to_timeout);
+
+	sleeptime = sec_to_timeout * 1000 + microsec_to_timeout / 1000;
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c2db19b..fb63471 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2629,10 +2629,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	return HeapTupleGetOid(tup);
 }
 
@@ -3457,10 +3453,6 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4411,10 +4403,6 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..7061bba
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdw_xactdesc.c
+ *		PostgreSQL distributed transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdw_xactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdw_xact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		FdwXactOnDiskData *fdw_insert_xlog = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_insert_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_insert_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_insert_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_insert_xlog->local_xid);
+		/* TODO: This should be really interpreted by each FDW */
+
+		/*
+		 * TODO: we also need to assess whether we want to add this
+		 * information
+		 */
+		appendStringInfo(buf, " foreign transaction info: %s",
+						 fdw_insert_xlog->fdw_xact_id);
+	}
+	else
+	{
+		xl_fdw_xact_remove *fdw_remove_xlog = (xl_fdw_xact_remove *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_remove_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_remove_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_remove_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_remove_xlog->xid);
+	}
+
+}
+
+const char *
+fdw_xact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDW_XACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDW_XACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7..4a9ab3d 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,14 +112,16 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_prepared_xacts=%d max_locks_per_xact=%d "
 						 "wal_level=%s wal_log_hints=%s "
-						 "track_commit_timestamp=%s",
+						 "track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3942734..bc4e109 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -844,6 +845,35 @@ TwoPhaseGetGXact(TransactionId xid)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyProc
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2316,6 +2346,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2375,6 +2411,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8c1621d..9dca0f5 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1131,6 +1132,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1139,6 +1141,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsAtomicCommitReady();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1177,12 +1180,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1340,6 +1344,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -1990,6 +2002,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2146,6 +2161,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2233,6 +2249,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2422,6 +2440,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2627,6 +2646,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 62fc418..8e18eea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5261,6 +5262,7 @@ BootStrapXLOG(void)
 	ControlFile->MaxConnections = MaxConnections;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6348,6 +6350,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6872,14 +6877,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdw_xact, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7071,7 +7077,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7577,6 +7586,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7895,6 +7905,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9211,6 +9224,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9644,7 +9658,8 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9676,6 +9691,7 @@ XLogReportParameters(void)
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9881,6 +9897,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10079,6 +10096,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->MaxConnections = xlrec.MaxConnections;
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 53ddc59..c27cd5f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_prepared_fdw_xacts AS
+       SELECT * FROM pg_prepared_fdw_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
 	l.objoid, l.classoid, l.objsubid,
@@ -773,6 +776,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_fdwxact_resolvers AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_fdwxact_resolver() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index e5dd995..dac1e3a 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -1093,6 +1094,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1407,6 +1420,16 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srv->serverid,
+						useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0bcb237..058bc0a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
@@ -749,7 +750,10 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+		FdwXactMarkForeignServerAccessed(partRelInfo->ri_RelationDesc, 0, true);
+	}
 
 	MemoryContextSwitchTo(oldContext);
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 2ec7fcb..4578bc0 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,10 +226,16 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
+	FdwXactMarkForeignServerAccessed(scanstate->ss.ss_currentRelation,
+									 eflags, node->operation != CMD_SELECT);
+
 	return scanstate;
 }
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 528f587..66c3699 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "commands/trigger.h"
@@ -44,6 +45,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -485,6 +487,10 @@ ExecInsert(ModifyTableState *mtstate,
 								HEAP_INSERT_SPECULATIVE,
 								NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
 												   estate, true, &specConflict,
@@ -530,6 +536,10 @@ ExecInsert(ModifyTableState *mtstate,
 								estate->es_output_cid,
 								0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
@@ -722,6 +732,11 @@ ldelete:;
 							 true /* wait for commit */ ,
 							 &hufd,
 							 changingPart);
+
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -1210,6 +1225,11 @@ lreplace:;
 							 estate->es_crosscheck_snapshot,
 							 true /* wait for commit */ ,
 							 &hufd, &lockmode);
+
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -2321,6 +2341,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 fdw_private,
 															 i,
 															 eflags);
+
+			/* Mark this transaction modified data on the foreign server */
+			FdwXactMarkForeignServerAccessed(resultRelInfo->ri_RelationDesc,
+											 eflags, true);
 		}
 
 		resultRelInfo++;
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index a0bcc04..b2097ad 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -155,6 +155,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d2b695e..b722b9a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 42bccce..5116369 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3484,6 +3484,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3678,6 +3684,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3893,6 +3902,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDW_XACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 688f462..883ad85 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -896,6 +898,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -971,12 +977,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afb4972..960fd6a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDW_XACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..c5610ee 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -150,6 +152,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, BackendRandomShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +274,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	BackendRandomShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 908f62d..cc578b2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -90,6 +90,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -245,6 +247,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1323,6 +1326,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
 	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	volatile TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1384,6 +1388,7 @@ GetOldestXmin(Relation rel, int flags)
 	/* fetch into volatile var while ProcArrayLock is held */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1434,6 +1439,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDW_XACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3016,6 +3030,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..a42d06e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,5 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+FdwXactLock					46
+FdwXactResolverLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa5..8e55dad 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -398,6 +399,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -799,6 +804,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a3b9757..48f3c59 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -2994,6 +2996,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2317e8b..7651352 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/transam.h"
@@ -378,6 +379,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
+/*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
  */
@@ -659,6 +679,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2235,6 +2259,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4056,6 +4126,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+			gettext_noop("Sets the usage of two-phase commit protocol for distributed transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
+	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4e61bc6..88cdc85 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -121,6 +121,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -287,6 +289,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index ad06e8e..ca3eb62 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ab5cb7f..609578c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -209,6 +209,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdw_xact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 895a51f..7df88e0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -306,6 +306,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_worker_processes);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 6fb403a..6d867c8 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -730,6 +730,7 @@ GuessControlValues(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -957,6 +958,7 @@ RewriteControlFile(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* Contents are protected with a CRC */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000..0928f4c
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,147 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL distributed transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDW_XACT_H
+#define FDW_XACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+#define	FDW_XACT_NOT_WAITING		0
+#define	FDW_XACT_WAITING			1
+#define	FDW_XACT_WAITING_RETRY		2
+#define	FDW_XACT_WAIT_COMPLETE		3
+
+#define FdwXactEnabled() (max_prepared_foreign_xacts > 0)
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDW_XACT_ID_MAX_LEN 200
+
+/* Enum to track the status of prepared foreign transaction */
+typedef enum
+{
+	FDW_XACT_INITIAL,
+	FDW_XACT_PREPARING,					/* foreign transaction is being prepared */
+	FDW_XACT_PREPARED,					/* foreign transaction is prepared */
+	FDW_XACT_COMMITTING_PREPARED,		/* foreign prepared transaction is to
+										 * be committed */
+	FDW_XACT_ABORTING_PREPARED, /* foreign prepared transaction is to be
+								 * aborted */
+} FdwXactStatus;
+
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER,	/* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support twophase
+								 * commit */
+} ForeignTwophaseCommitLevel;
+
+/* Shared memory entry for a prepared or being prepared foreign transaction */
+typedef struct FdwXactData *FdwXact;
+
+typedef struct FdwXactData
+{
+	FdwXact		fxact_free_next;	/* Next free FdwXact entry */
+	FdwXact		fxact_next;			/* Pointer to the neext FdwXact entry accosiated
+									 * with the same transaction */
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes place */
+	Oid				userid;			/* user who initiated the foreign transaction */
+	Oid				umid;
+	FdwXactStatus 	status;			/* The state of the foreign transaction. This
+									 * doubles as the action to be taken on this entry. */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/* Shared memory layout for maintaining foreign prepared transaction entries. */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		freeFdwXacts;
+
+	/* Number of valid foreign transaction entries */
+	int			numFdwXacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdw_xacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+typedef struct FdwXactState
+{
+	Oid		serverid;
+	Oid		userid;
+	Oid		umid;
+	char	*fdwxact_id;
+	void	*fdw_state;		/* foreign-data wrapper can keep state here */
+} FdwXactState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdw_xact_exists(TransactionId xid, Oid dboid, Oid serverid,
+				Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactResolveDistributedTransaction(Oid dbid, bool is_active);
+extern void FdwXactResolveAllDanglingTransactions(Oid dbid);
+extern bool FdwXactIsAtomicCommitReady(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void FdwXactMarkForeignServerAccessed(Relation rel, int flags, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+
+#endif   /* FDW_XACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000..4ea65b2
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _FDWXACT_LAUNCHER_H
+#define _FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherWakeupToRequest(void);
+extern void FdwXactLauncherWakeupToRetry(void);
+
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+
+extern bool IsFdwXactLauncher(void);
+
+extern void fdwxact_maybe_launch_resolver(bool ignore_error);
+
+
+#endif	/* _FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000..6b2a24f
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000..e92b5a1
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDW_XACT_INSERT	0x00
+#define XLOG_FDW_XACT_REMOVE	0x10
+
+/* Same as GIDSIZE */
+#define FDW_XACT_MAX_ID_LEN 200
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdw_xact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+} xl_fdw_xact_remove;
+
+extern void fdw_xact_redo(XLogReaderState *record);
+extern void fdw_xact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdw_xact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000..36391d4
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,67 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _RESOLVER_INTERNAL_H
+#define _RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/*
+	 * Foreign transaction resolution queues. Protected by FdwXactLock.
+	 */
+	SHM_QUEUE	FdwXactActiveQueue;
+	SHM_QUEUE	FdwXactRetryQueue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* _RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 0bbe9879..c15dff7 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDW_XACT_ID, "Foreign Transactions", fdw_xact_redo, fdw_xact_desc, fdw_xact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 0e932da..b199c88 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 2c1b2d8..63c833d 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -105,6 +105,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE				(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 30610b3..795e85a 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -227,6 +227,7 @@ typedef struct xl_parameter_change
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6..3d5333a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -178,6 +178,7 @@ typedef struct ControlFileData
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4d7fe1b..599ce8c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5032,6 +5032,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_fdwxact_resolver', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_fdwxact_resolver' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5737,6 +5744,22 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_prepared_fdw_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{dbid,transaction,serverid,userid,status,identifier}',
+  prosrc => 'pg_prepared_fdw_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction',
+  proname => 'pg_remove_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_remove_fdw_xact' },
+{ oid => '6052', descr => 'resolve foreign transaction',
+  proname => 'pg_resolve_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_resolve_fdw_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb54..92d47bb 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
@@ -168,6 +169,14 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef bool (*PrepareForeignTransaction_function) (FdwXactState *state);
+typedef bool (*CommitForeignTransaction_function) (FdwXactState *state);
+typedef bool (*RollbackForeignTransaction_function) (FdwXactState *state);
+typedef bool (*ResolveForeignTransaction_function) (FdwXactState *state,
+													bool is_commit);
+typedef bool (*IsTwoPhaseCommitEnabled_function) (Oid serverid);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -235,6 +244,14 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for distributed transactions */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	ResolveForeignTransaction_function ResolveForeignTransaction;
+	IsTwoPhaseCommitEnabled_function IsTwoPhaseCommitEnabled;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -247,7 +264,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e6..d030368 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -68,10 +68,10 @@ typedef struct ForeignTable
 	List	   *options;		/* ftoptions as DefElem list */
 } ForeignTable;
 
-
 extern ForeignServer *GetForeignServer(Oid serverid);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
 							bool missing_ok);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f1c10d1..05feb0a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,6 +759,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDW_XACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -833,7 +835,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDW_XACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -913,6 +916,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_READ,
+	WAIT_EVENT_FDW_XACT_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index cb613c8..45880b2 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -153,6 +153,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction
+								 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 75bab29..25d6a2f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDW_XACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..81560bd 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -94,6 +94,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 735dd37..fdd6ded 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1413,6 +1413,13 @@ pg_policies| SELECT n.nspname AS schemaname,
    FROM ((pg_policy pol
      JOIN pg_class c ON ((c.oid = pol.polrelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+pg_prepared_fdw_xacts| SELECT f.dbid,
+    f.transaction,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.identifier
+   FROM pg_prepared_fdw_xacts() f(dbid, transaction, serverid, userid, status, identifier);
 pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
@@ -1821,6 +1828,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_fdwxact_resolvers| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_fdwxact_resolver() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.10.5

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#14)
4 attachment(s)

On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

# It took a long time to come here..

At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

...

* Updated docs, added the new section "Distributed Transaction" at
Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact directory.

* Some bug fixes.

Please reivew them.

I have some comments, with apologize in advance for possible
duplicate or conflict with others' comments so far.

Thank youf so much for reviewing this patch!

0001:

This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
relation is modified. Isn't it needed when UNLOGGED tables are
modified? It may be better that we have dedicated classification
macro or function.

I think even if we do atomic commit for modifying the an UNLOGGED
table and a remote table the data will get inconsistent if the local
server crashes. For example, if the local server crashes after
prepared the transaction on foreign server but before the local commit
and, we will lose the all data of the local UNLOGGED table whereas the
modification of remote table is rollbacked. In case of persistent
tables, the data consistency is left. So I think the keeping data
consistency between remote data and local unlogged table is difficult
and want to leave it as a restriction for now. Am I missing something?

The flag is handled in heapam.c. I suppose that it should be done
in the upper layer considering coming pluggable storage.
(X_F_ACCESSEDTEMPREL is set in heapam, but..)

Yeah, or we can set the flag after heap_insert in ExecInsert.

0002:

The name FdwXactParticipantsForAC doesn't sound good for me. How
about FdwXactAtomicCommitPartitcipants?

+1, will fix it.

Well, as the file comment of fdwxact.c,
FdwXactRegisterTransaction is called from FDW driver and
F_X_MarkForeignTransactionModified is called from executor. I
think that we should clarify who is responsible to the whole
sequence. Since the state of local tables affects, I suppose
executor is that. Couldn't we do the whole thing within executor
side? I'm not sure but I feel that
F_X_RegisterForeignTransaction can be a part of
F_X_MarkForeignTransactionModified. The callers of
MarkForeignTransactionModified can find whether the table is
involved in 2pc by IsTwoPhaseCommitEnabled interface.

Indeed. We can register foreign servers by executor while FDWs don't
need to register anything. I will remove the registration function so
that FDW developers don't need to call the register function but only
need to provide atomic commit APIs.

if (foreign_twophase_commit == true &&
((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));

The error is emitted when a the GUC is turned off in the
trasaction where MarkTransactionModify'ed. I think that the
number of the variables' possible states should be reduced for
simplicity. For example in the case, once foreign_twopase_commit
is checked in a transaction, subsequent changes in the
transaction should be ignored during the transaction.

I might have not gotten your comment correctly but since the
foreign_twophase_commit is a PGC_USERSET parameter I think we need to
check it at commit time. Also we need to keep participant servers even
when foreign_twophase_commit is off if both max_prepared_foreign_xacts
and max_foreign_xact_resolvers are > 0.

I will post the updated patch in this week.

Attached the updated version patches.

Based on the review comment from Horiguchi-san, I've changed the
atomic commit API so that the FDW developer who wish to support atomic
commit don't need to call the register function. The atomic commit
APIs are following:

* GetPrepareId
* PrepareForeignTransaction
* CommitForeignTransaction
* RollbackForeignTransaction
* ResolveForeignTransaction
* IsTwophaseCommitEnabled

The all APIs except for GetPreapreId is required for atomic commit.

Also, I've changed the foreign_twophase_commit parameter to an enum
parameter based on the suggestion from Robert[1]. Valid values are
'required', 'prefer' and 'disabled' (default). When set to either
'required' or 'prefer' the atomic commit will be used. The difference
between 'required' and 'prefer' is that when set to 'requried' we
require for *all* modified server to be able to use 2pc whereas when
'prefer' we require 2pc where available. So if any of written
participants disables 2pc or doesn't support atomic comit API the
transaction fails. IOW, when 'required' we can commit only when data
consistency among all participant can be left.

Please review the patches.

Since the previous patch conflicts with current HEAD attached updated
set of patches.

Rebased and fixed a few bugs.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v22-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v22-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 67010b0b7965045df805cb3e96bce110d5d88ddf Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 8 Feb 2018 11:26:46 +0900
Subject: [PATCH v22 1/4] Keep track of writing on non-temporary relation.

---
 src/backend/access/heap/heapam.c | 12 ++++++++++++
 src/include/access/xact.h        |  5 +++++
 2 files changed, 17 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fb63471..c2db19b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2629,6 +2629,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleGetOid(tup);
 }
 
@@ -3453,6 +3457,10 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4403,6 +4411,10 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(relation))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 689c57c..2c1b2d8 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -98,6 +98,11 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
-- 
2.10.5

v22-0003-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v22-0003-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From f86094413a1195f5a7c24d3ba834e1b832592d76 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v22 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                  |   7 +-
 contrib/postgres_fdw/connection.c              | 609 ++++++++++++++++---------
 contrib/postgres_fdw/expected/postgres_fdw.out | 344 +++++++++++++-
 contrib/postgres_fdw/fdwxact.conf              |   3 +
 contrib/postgres_fdw/option.c                  |   5 +-
 contrib/postgres_fdw/postgres_fdw.c            |  58 ++-
 contrib/postgres_fdw/postgres_fdw.h            |  11 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 159 +++++++
 doc/src/sgml/postgres-fdw.sgml                 |  37 ++
 9 files changed, 989 insertions(+), 244 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 85394b4..5198f40 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -10,7 +10,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -23,3 +23,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fe4893a..494491c 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -14,9 +14,12 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -45,7 +48,7 @@
  */
 typedef Oid ConnCacheKey;
 
-typedef struct ConnCacheEntry
+struct ConnCacheEntry
 {
 	ConnCacheKey key;			/* hash key (must be first) */
 	PGconn	   *conn;			/* connection to foreign server, or NULL */
@@ -56,9 +59,21 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
-} ConnCacheEntry;
+};
+
+/*
+ * Foreign transaction state using postgres_fdw.
+ */
+typedef struct PgFdwXactState
+{
+	Oid		serverid;
+	Oid		userid;
+	Oid		umid;
+	ConnCacheEntry	*conn;
+} PgFdwXactState;
 
 /*
  * Connection cache (initialized on first use)
@@ -69,17 +84,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
 					   SubTransactionId parentSubid,
@@ -91,24 +102,20 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
-
-/*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
- */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +135,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,16 +142,11 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
 	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+
 	if (!found)
 	{
 		/*
@@ -155,6 +156,17 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +194,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +203,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +214,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +230,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -414,7 +465,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -644,193 +695,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 }
 
 /*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow remote transactions that modified anything,
-					 * since it's not very reasonable to hold them open until
-					 * the prepared transaction is committed.  For the moment,
-					 * throw error unconditionally; later we might allow
-					 * read-only cases.  Note that the error will cause us to
-					 * come right back here with event == XACT_EVENT_ABORT, so
-					 * we'll clean up the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot prepare a transaction that modified remote tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
-/*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
 static void
@@ -846,10 +710,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -860,6 +720,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1193,3 +1057,302 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ *
+ * This function is called only at the pre-commit phase of the local transaction.
+ */
+bool
+postgresPrepareForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate;
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+	StringInfo	command;
+
+	entry = GetConnectionCacheEntry(state->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	rstate = (PgFdwXactState *) palloc0(sizeof(PgFdwXactState));
+	rstate->serverid = state->serverid;
+	rstate->userid = state->userid;
+	rstate->umid = state->umid;
+	rstate->conn = entry;
+	state->fdw_state = (void *)rstate;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+	{
+		result = true;
+		elog(DEBUG1, "prepared foreign transaction on server %u with ID %s",
+			 state->serverid, state->fdwxact_id);
+	}
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * Commit a transaction on foreign server.
+ *
+ * This function is called both at the pre-commit phase of the local transaction.
+ */
+bool
+postgresCommitForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate;
+	ConnCacheEntry *entry = NULL;
+	bool		result = false;
+	PGresult	*res;
+
+	entry = GetConnectionCacheEntry(state->umid);
+
+	if (!entry || !entry->conn || !entry->xact_got_connection)
+		return true;
+
+	rstate = (PgFdwXactState *) palloc0(sizeof(PgFdwXactState));
+	rstate->serverid = state->serverid;
+	rstate->userid = state->userid;
+	rstate->umid = state->umid;
+	rstate->conn = entry;
+	state->fdw_state = (void *)rstate;
+
+	/*
+	 * If abort cleanup previously failed for this connection,
+	 * we can't issue any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		result = true;
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/*
+ * Rollback a transaction on foreign server.
+ *
+ * This function is called each time when aborting.
+ */
+bool
+postgresRollbackForeignTransaction(FdwXactState *state)
+{
+	PgFdwXactState *rstate = (PgFdwXactState *) state->fdw_state;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (rstate)
+		entry = rstate->conn;
+	else
+		entry = GetConnectionCacheEntry(state->umid);
+
+	if (!entry || !entry->conn)
+		return true;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return true;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return !abort_cleanup_failure;
+}
+
+/*
+ * Resolve a prepared transaction on foreign server.
+ *
+ * This function is called after committed locally by either a foreign transaction
+ * resolver or pg_resolve_fdw_xact.
+ */
+bool
+postgresResolveForeignTransaction(FdwXactState *state, bool is_commit)
+{
+	ConnCacheEntry *entry = NULL;
+	StringInfo	command;
+	bool result = true;
+	PGresult	*res;
+
+	entry = GetConnectionState(state->umid, false, false);
+
+	if (!entry->conn)
+		return false;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 state->fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * If we tried to COMMIT/ABORT a prepared transaction and the prepared
+		 * transaction was missing on the foreign server, it was probably
+		 * resolved by some other means. Anyway, it should be considered as resolved.
+		 */
+		result = (sqlstate == ERRCODE_UNDEFINED_OBJECT);
+
+		/*
+		 * The command failed, raise a warning to log the reason of failure.
+		 * We may not be in a transaction here, so raising error doesn't
+		 * help.
+		 */
+		pgfdw_report_error(WARNING, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction on server %u with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 state->serverid,
+		 state->fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return result;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 21a2ef5..c63885d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,15 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -185,16 +207,19 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                                      List of foreign tables
- Schema |   Table    |  Server   |                   FDW options                    | Description 
---------+------------+-----------+--------------------------------------------------+-------------
- public | ft1        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft2        | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
- public | ft4        | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
- public | ft5        | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft6        | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
- public | ft_pg_type | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
-(6 rows)
+                                         List of foreign tables
+ Schema |      Table       |  Server   |                   FDW options                    | Description 
+--------+------------------+-----------+--------------------------------------------------+-------------
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1')            | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')            | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')            | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')            | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type') | 
+(9 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8650,3 +8675,302 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+\det+
+                                                 List of foreign tables
+ Schema |      Table       |  Server   |                            FDW options                            | Description 
+--------+------------------+-----------+-------------------------------------------------------------------+-------------
+ public | fpagg_tab_p1     | loopback  | (table_name 'pagg_tab_p1')                                        | 
+ public | fpagg_tab_p2     | loopback  | (table_name 'pagg_tab_p2')                                        | 
+ public | fpagg_tab_p3     | loopback  | (table_name 'pagg_tab_p3')                                        | 
+ public | ft1              | loopback  | (schema_name 'S 1', table_name 'T 1')                             | 
+ public | ft2              | loopback  | (schema_name 'S 1', table_name 'T 1', use_remote_estimate 'true') | 
+ public | ft3              | loopback  | (table_name 'loct3', use_remote_estimate 'true')                  | 
+ public | ft4              | loopback  | (schema_name 'S 1', table_name 'T 3')                             | 
+ public | ft5              | loopback  | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft6              | loopback2 | (schema_name 'S 1', table_name 'T 4')                             | 
+ public | ft7_twophase     | loopback  | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft8_twophase     | loopback2 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft9_not_twophase | loopback3 | (schema_name 'S 1', table_name 'T 5')                             | 
+ public | ft_pg_type       | loopback  | (schema_name 'pg_catalog', table_name 'pg_type')                  | 
+ public | ftprt1_p1        | loopback  | (table_name 'fprt1_p1', use_remote_estimate 'true')               | 
+ public | ftprt1_p2        | loopback  | (table_name 'fprt1_p2')                                           | 
+ public | ftprt2_p1        | loopback  | (table_name 'fprt2_p1', use_remote_estimate 'true')               | 
+ public | ftprt2_p2        | loopback  | (table_name 'fprt2_p2', use_remote_estimate 'true')               | 
+ public | rem1             | loopback  | (table_name 'loc1')                                               | 
+ public | rem2             | loopback  | (table_name 'loc2')                                               | 
+(19 rows)
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+  srvname  
+-----------
+ loopback
+ loopback2
+ loopback3
+(3 rows)
+
+-- Enable atomic commit
+SET distributed_atomic_commit TO 'required';
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(5);
+INSERT INTO ft9_not_twophase VALUES(5);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+-- Check differences between configuration when a transaction mofieid
+-- data on both 2pc-capable and non-2pc-capable servers.
+-- When set to 'required' it fails.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+ERROR:  cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+-- When set to 'prefer', we can commit it
+SET distributed_atomic_commit TO 'prefer';
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+  7
+  7
+(6 rows)
+
+-- But cannot prepare the local transaction
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+  7
+  7
+(6 rows)
+
+-- When set to 'disabled', we can commit it
+SET distributed_atomic_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft8_twophase VALUES(8);
+INSERT INTO ft9_not_twophase VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+  7
+  7
+  8
+  8
+(8 rows)
+
+-- Similary, but cannot prepare the local transaction
+BEGIN;
+INSERT INTO ft8_twophase VALUES(8);
+INSERT INTO ft9_not_twophase VALUES(8);
+PREPARE TRANSACTION 'gx1'; -- error
+ERROR:  cannot PREPARE a distributed transaction when distributed_atomic_commit is 'disabled'
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+  7
+  7
+  8
+  8
+(8 rows)
+
+SET distributed_atomic_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+  7
+  7
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_twophase;
+ c1 
+----
+  1
+  2
+  2
+  3
+  7
+  7
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+ count 
+-------
+     0
+(1 row)
+
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000..3fdbf93
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 6854f1b..404b318 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -108,7 +108,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		 * Validate option value, when we can do so without any context.
 		 */
 		if (strcmp(def->defname, "use_remote_estimate") == 0 ||
-			strcmp(def->defname, "updatable") == 0)
+			strcmp(def->defname, "updatable") == 0 ||
+			strcmp(def->defname, "two_phase_commit") == 0)
 		{
 			/* these accept only boolean values */
 			(void) defGetBoolean(def);
@@ -177,6 +178,8 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* two-phase commit support */
+		{"two_phase_commit", ForeignServerRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fd20aa9..5214627 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -359,6 +360,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
 							 void *extra);
+static bool postgresIsTwoPhaseCommitEnabled(Oid serverid);
 
 /*
  * Helper functions
@@ -452,7 +454,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -506,10 +507,28 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->ResolveForeignTransaction = postgresResolveForeignTransaction;
+	routine->IsTwoPhaseCommitEnabled = postgresIsTwoPhaseCommitEnabled;
+
 	PG_RETURN_POINTER(routine);
 }
 
 /*
+ * postgresIsTwoPhaseCommitEnabled
+ */
+static bool
+postgresIsTwoPhaseCommitEnabled(Oid serverid)
+{
+	ForeignServer	*server = GetForeignServer(serverid);
+
+	return server_uses_twophase_commit(server);
+}
+
+/*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
  *
@@ -1356,7 +1375,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2411,7 +2430,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2704,7 +2723,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								&retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3321,7 +3340,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4108,7 +4127,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4198,7 +4217,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4421,7 +4440,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
@@ -5803,3 +5822,26 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+/*
+ * server_uses_twophase_commit
+ * Returns true if the foreign server is configured to support 2PC.
+ */
+bool
+server_uses_twophase_commit(ForeignServer *server)
+{
+	ListCell		*lc;
+
+	/* Check the options for two phase compliance */
+	foreach(lc, server->options)
+	{
+		DefElem    *d = (DefElem *) lfirst(lc);
+
+		if (strcmp(d->defname, "two_phase_commit") == 0)
+		{
+			return defGetBoolean(d);
+		}
+	}
+	/* By default a server is not 2PC compliant */
+	return false;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 70b538e..3526923 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
@@ -110,12 +111,14 @@ typedef struct PgFdwRelationInfo
 	int			relation_index;
 } PgFdwRelationInfo;
 
+typedef struct ConnCacheEntry ConnCacheEntry;
+
 /* in postgres_fdw.c */
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -123,6 +126,11 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern bool postgresPrepareForeignTransaction(FdwXactState *state);
+extern bool postgresCommitForeignTransaction(FdwXactState *state);
+extern bool postgresRollbackForeignTransaction(FdwXactState *state);
+extern bool postgresResolveForeignTransaction(FdwXactState *state,
+											  bool is_commit);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -181,6 +189,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						List *remote_conds, List *pathkeys, bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 88c4cb4..49b4f4c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,19 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_twophase (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_twophase (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft9_not_twophase (
+       c1 int NOT NULL
+) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- A table with oids. CREATE FOREIGN TABLE doesn't support the
 -- WITH OIDS option, but ALTER does.
 CREATE FOREIGN TABLE ft_pg_type (
@@ -2354,3 +2381,135 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test atomic commit across foreign servers
+-- ===================================================================
+
+ALTER SERVER loopback OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback2 OPTIONS(ADD two_phase_commit 'on');
+ALTER SERVER loopback3 OPTIONS(ADD two_phase_commit 'off');
+
+\det+
+
+-- Check two_phase_commit setting
+SELECT srvname FROM pg_foreign_server WHERE 'two_phase_commit=on' = ANY(srvoptions) or 'two_phase_commit=off' = ANY(srvoptions);
+
+-- Enable atomic commit
+SET distributed_atomic_commit TO 'required';
+
+-- Modify one 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+COMMIT;
+SELECT * FROM ft7_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+
+-- Modify two 2PC-capable servers then commit and rollback.
+-- This requires to use 2PC when commit.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+COMMIT;
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(2);
+INSERT INTO ft8_twophase VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_twophase;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error
+BEGIN;
+INSERT INTO ft7_twophase VALUES(4);
+INSERT INTO ft8_twophase VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+
+-- Rollback foreign transaction that involves both 2PC-capable
+-- and 2PC-non-capable foreign servers.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(5);
+INSERT INTO ft9_not_twophase VALUES(5);
+ROLLBACK;
+SELECT * FROM ft8_twophase;
+
+-- Check differences between configuration when a transaction mofieid
+-- data on both 2pc-capable and non-2pc-capable servers.
+
+-- When set to 'required' it fails.
+BEGIN;
+INSERT INTO ft8_twophase VALUES(6);
+INSERT INTO ft9_not_twophase VALUES(6);
+COMMIT; -- error
+SELECT * FROM ft8_twophase;
+
+-- When set to 'prefer', we can commit it
+SET distributed_atomic_commit TO 'prefer';
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+
+-- But cannot prepare the local transaction
+BEGIN;
+INSERT INTO ft8_twophase VALUES(7);
+INSERT INTO ft9_not_twophase VALUES(7);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+
+-- When set to 'disabled', we can commit it
+SET distributed_atomic_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft8_twophase VALUES(8);
+INSERT INTO ft9_not_twophase VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft8_twophase;
+
+-- Similary, but cannot prepare the local transaction
+BEGIN;
+INSERT INTO ft8_twophase VALUES(8);
+INSERT INTO ft9_not_twophase VALUES(8);
+PREPARE TRANSACTION 'gx1'; -- error
+SELECT * FROM ft8_twophase;
+
+SET distributed_atomic_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_twophase;
+
+BEGIN;
+INSERT INTO ft7_twophase VALUES(9);
+INSERT INTO ft8_twophase VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_twophase;
+
+-- No entry remained
+SELECT count(*) FROM pg_prepared_fdw_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 54b5e98..f4a9ff5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
-- 
2.10.5

v22-0004-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v22-0004-Add-regression-tests-for-atomic-commit.patchDownload
From 11d87cacb90678b68a57363611060081aebbc0ae Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v22 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0..71c8b9d 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..640b206
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+distributed_atomic_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port', two_phase_commit 'on');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port', two_phase_commit 'on');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_prepared_fdw_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 3248603..c1cd8ae 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2288,9 +2288,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2305,7 +2308,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v22-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v22-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From d4eb4b8bdb36ec928bb14b081f9067f223afa791 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH v22 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |   97 +
 doc/src/sgml/config.sgml                      |  143 +-
 doc/src/sgml/distributed-transaction.sgml     |  157 ++
 doc/src/sgml/fdwhandler.sgml                  |  203 ++
 doc/src/sgml/filelist.sgml                    |    1 +
 doc/src/sgml/func.sgml                        |   51 +
 doc/src/sgml/monitoring.sgml                  |   60 +
 doc/src/sgml/postgres.sgml                    |    1 +
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/fdwxact.c          | 2678 +++++++++++++++++++++++++
 src/backend/access/fdwxact/fdwxact_launcher.c |  641 ++++++
 src/backend/access/fdwxact/fdwxact_resolver.c |  331 +++
 src/backend/access/heap/heapam.c              |   12 -
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   65 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   32 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    7 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   25 +
 src/backend/executor/nodeModifyTable.c        |   24 +
 src/backend/foreign/foreign.c                 |   43 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   80 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  149 ++
 src/include/access/fdwxact_launcher.h         |   32 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   52 +
 src/include/access/resolver_internal.h        |   67 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/foreign/fdwapi.h                  |   18 +-
 src/include/foreign/foreign.h                 |    2 +-
 src/include/pgstat.h                          |    8 +-
 src/include/storage/proc.h                    |   10 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |   12 +
 63 files changed, 5324 insertions(+), 40 deletions(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_launcher.c
 create mode 100644 src/backend/access/fdwxact/fdwxact_resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 8b7f169..f2f0571 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9597,6 +9597,103 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-prepared-fdw-xacts">
+  <title><structname>pg_prepared_fdw_xacts</structname></title>
+
+  <indexterm zone="view-pg-prepared-fdw-xacts">
+   <primary>pg_prepared_fdw_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_prepared_fdw_xacts</structname> displays
+   information about foreign transactions that are currently prepared on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="fdw-transaction-managements"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_prepared_xacts</structname> contains one row per prepared
+   foreign transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_prepared_fdw_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>transaction</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Transaction id that this foreign transaction associates with
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server that this foreign server is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction: <literal>prepared</literal>, <literal>committing</literal>, <literal>aborting</literal> or <literal>unknown</literal>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0f8f2ef..4fffb76 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3611,7 +3611,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -7827,6 +7826,148 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-distributed-atomic-commit" xreflabel="distributed_atomic_commit">
+       <term><varname>distributed_atomic_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>distributed_atomic_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign transaction
+         to be resolved before the command returns a "success" indication to the client.
+         Valid values are <literal>required</literal>, <literal>prefer</literal> and
+         <literal>disabled</literal>. The default setting is <literal>disabled</literal>.
+         When <literal>disabled</literal>, there can be risk of database consistency among
+         distributed transaction if some foreign server crashes during committing the
+         distributed transaction. When set to <literal>required</literal> the distributed
+         transaction requires that all written servers can use two-phase commit protocol.
+         That is, the transaction fails if any of servers returns <literal>false</literal>
+         from <function>IsTwoPhaseCommitEnabled</function> or does not support transaction
+         management callback routines(described in
+         <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction requires
+         two-phase commit protocol where available but without failing when it is not
+         available.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero value to
+         set this parameter either <literal>required</literal> or <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one transaction
+         is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism.  You should set this value to
+         zero only if you set <varname>max_foreign_transaction_resolvers</varname> as
+         much as databases you have. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..deb8a60
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,157 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction Management</title>
+
+ <para>
+  This chapter explains what distributed transaction management is, and how it can be configured
+  in PostgreSQL.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit is an operation that applies a set of changes as a single operation
+   globally. <productname>PostgreSQL</productname> provides a way to perform a transaction
+   with foreign resources using <literal>Foreign Data Wrapper</literal>. Using the
+   <productname>PostgreSQL</productname>'s atomic commit ensures that all changes
+   on foreign servers end in either commit or rollback using the transaction callback
+   routines (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs Two-phase commit protocol, which is a
+    type of atomic commitment protocol (ACP). Using Two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed transaction manager
+    prepares all transaction on the foreign servers if two-phase commit is required.
+    Two-phase commit is required only if the transaction modifies data on two or more
+    servers including the local server itself and user requests it by
+    <xref linkend="guc-distributed-atomic-commit"/>. If all preparations on foreign servers
+    got successful go to the next step. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes over rollback, then rollback all transactions
+    on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the transaction
+    locally. Any failure happens in this step <productname>PostgreSQL</productname> changes
+    over rollback, then rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign Transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Foreign Transaction Resolution</title>
+
+   <para>
+    Foreign transaction resolutions are performed by foreign transaction resolver process.
+    They commit all prepared transaction on foreign servers if the coordinator received
+    an agreement message from all foreign server during the first step. On the other hand,
+    if any foreign server failed to prepare the transaction, it rollbacks all prepared
+    transactions.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions on one
+    database of the coordinator side. On failure during resolution, they retries to
+    resolve after <varname>foreign_transaction_resolution_interval</varname>.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>In-doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit or rollback
+    using two-phase commit protocol. However, if the second phase fails for whatever reason
+    the transaction becomes in-doubt. The transactions becomes in-doubt in the following
+    situations:
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server crashes during atomic commit
+      operation.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      A local <productname>PostgreSQL</productname> server got a cancellation by user during
+      atomic commit.
+     </para>
+    </listitem>
+   </itemizedlist>
+
+   In-doubt transactions are automatically handled by foreign transaction resolver process
+   when there is no online transaction requesting resolutions.
+   <function>pg_resolve_fdw_xact</function> provides a way to resolve transactions on foreign
+   servers manually that participated the distributed transaction manually.
+   </para>
+
+   <para>
+    The atomic commit operation is crash-safe. The being processed foreign transactions at
+    crash are processed by a foreign transaction resolvers as an in-doubt transaction
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-fdwxact-resolver-view"><literal>pg_stat_fdwxact_resolver</literal></link>
+    view. This view contains one row for every foreign Transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd..90cc415 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1390,6 +1390,118 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>atomic commit</firstterm>
+     (as described in <xref linkend="fdw-transaction-managements"/>), it must call the
+     registrasaction function <function>FdwXactRegisterForeignTransaction</function>
+     and provide the following callback functions:
+    </para>
+
+    <para>
+<programlisting>
+bool
+PrepareForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if atomic commit is required.
+    Returning <literal>true</literal> means that preparing the foreign
+    transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Commit the not-prepared transaction on the foreign server.
+    This function is called at the pre-commit phase of local
+    transaction if atomic commit is not required. The atomic
+    commit is not required either when we modified data on
+    only one server including the local server or when userdoesn't
+    request atomic commit by <xref linkend="guc-distributed-atomic-commit"/>.
+    Returning <literal>true</literal> means that commit the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactResolveState *state);
+</programlisting>
+    Rollback a not-prepared transaction on the foreign server.
+    This function is called at the end of local transaction after
+    rollbacked locally either when user requested rollback or when
+    any error occurs during the transaction. This function could
+    be called recursively if any error occurs during rollback the
+    foreign transaction for whatever reason. You need to track
+    recursion and prevent this function from being called infinitely.
+    Returning <literal>true</literal> means that rollback the
+    foreign transaction got successful.
+    </para>
+    <para>
+<programlisting>
+bool
+ResolvePreparedForeignTransaction(FdwXactResolveState *state,
+                                  bool is_commit);
+</programlisting>
+    Commit or rollback the prepared transaction on the foreign server.
+    When <varname>is_commit</varname> is true, it indicates that the foreign
+    transaction should be committed. Otherwise the foreign transaction should
+    be aborted.
+    This function normally is called by the foreign transaction resolver
+    process but can also be called by <function>pg_resovle_fdw_xacts</function>
+    function. In the resolver process, this function is called either
+    when a backend requests the resolver process to resolve a distributed
+    transaction after prepared, or when a database has dangling
+    transactions. Returning <literal>true</literal> means that resolving
+    the foreign transaction got successful.
+    In abort case, please note that the prepared transaction identified
+    by <varname>state->fdwxact_id</varname> might not exist on the foreign
+    server. If you failed to resolve the foreign transaction due to undefined
+    object error (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) you should
+   regards it as success and return <literal>true</literal>.
+    </para>
+    <para>
+<programlisting>
+bool
+IsTwoPhaseCommitEnabled(Oid serverid);
+</programlisting>
+    Return <literal>true</literal> if the foreign server identified by
+    <literal>serverid</literal> is capable of two-phase commit protocol.
+    This function is called at commit time once.
+    Return <literal>false</literal> indicates that the current transaction
+    cannot use atomic commit even if atomic commit is requested by user.
+    </para>
+
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1835,4 +1947,95 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+    <title>Transaction managements for Foreign Data Wrappers</title>
+
+    <para>
+     <productname>PostgreSQL</productname> foreign transaction manager
+     allows FDWs to read and write data on foreign server within a transaction while
+     maintaining atomicity of the foreign data (aka atomic commit). Using
+     atomic commit, it guarantees that a distributed transaction is committed
+     or rollbacked on all participants foreign
+     server.  To achieve atomic commit, <productname>PostgreSQL</productname>
+     employees two-phase commit protocol, which is a type of atomic commitment
+     protocol. Every FDW that wish to support atomic commit
+     is required to support the transaction management callback routines:
+     <function>PrepareForeignTransaction</function>,
+     <function>CommitForeignTransaction</function>,
+     <function>RollbackForeignTransaction</function>,
+     <function>ResolveForeignTransaction</function>,
+     <function>IsTwoPhaseCommitEnabled</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+     Tranasction of foreign server that supports these callback routines is
+     managed by <productname>PostgreSQL</productname>'s distributed  transaction
+     manager. Each transaction management callbacks are called at appropriate time.
+    </para>
+
+    <para>
+     The information in <literal>FdwXactState</literal> can be used to identify
+     foreign servers. <literal>state-&gt;fdw_state</literal> is a <type>void</type>
+     pointer that is available for FDW transaction functions to store Information
+     relevant to the particular foreign server.  It is useful for passing
+     information forward from <function>PrepareForeignTransaction</function> and/or
+     <function>CommitTransaciton</function> to
+     <function>RollbackForeignTransaction</function>, there by avoiding recalculation.
+     Note that since <function>ResolveForeignTransaction</function> is called
+     idependently from these callback routines, the information is not passed to
+     <function>ResolverForeignTransaction</function>.
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling <function>PrepareForeignTransaction</function>
+     if two-phase commit protocol is required. Two-phase commit is required only if
+     the transaction modified data on more than one servers including the local
+     server itself and user requests atomic commit. <productname>PostgreSQL</productname>
+     can commit locally and go to the next step if and only if all preparing foreign
+     transactions got successful. If two-phase commit is not required, the foreign
+     transaction manager commits each transaction calling
+     <function>CommitForeignTransaction</function> and then commit locally.
+     If any failure happens or user requests to cancel during the pre-commit phase
+     the distributed Transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function> for not-prepared foreign
+     servers, and then rollback locally. The prepared foreign servers are rollbacked
+     by a foreign transaction resolver process.
+    </para>
+
+    <para>
+     Once committed locally, the distributed transaction must be committed. The
+     prepared foreign transaction will be committed by foreign transaction resolver
+     process.
+    </para>
+
+    <para>
+     When two-phase commit is required, after committed locally, the transaction
+     commit will wait for all prepared foreign transaction to be committed before
+     completetion. One foreign transaction resolver process is responsible for
+     foreign transaction resolution on a database.
+     <function>ResolverForeignTransaction</function> is called by the foreign
+     transaction resolver process when resolution.
+     <function>ResolveForeignTransaction</function> is also be called
+     when user executes <function>pg_resovle_fdw_xact</function> function.
+    </para>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a..38d6fcb 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edeb3fd..d609324 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20905,6 +20905,57 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-fdw-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_fdw_xacts</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_fdw_xacts</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function search for foreign transaction
+        matching the arguments and resolves then. This function won't resolve
+        a foreign transaction which is in progress, or one that is locked by some
+        other backend.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_fdw_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_fdw_xact</function>
+        except it remove foreign transaction entry without resolving.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7aada14..53f9e72 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_fdw_xact_resolver</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-fdwxact-resolver-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1198,6 +1206,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
         </row>
@@ -1413,6 +1433,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2222,6 +2246,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-fdwxact-resolver-view" xreflabel="pg_stat_fdw_xact_resolver">
+   <title><structname>pg_stat_fdw_xact_resolver</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603..c10e21f 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -164,6 +164,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index bd93a6a..4a1ebdc 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  tablesample transam
+			  tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..9ddbb14
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o fdwxact_resolver.o fdwxact_launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..109d6a7
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2678 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL distributed transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * When a foreign data wrapper starts transaction on a foreign server that
+ * is capable of two-phase commit protocol, foreign data wrappers registers
+ * the foreign transaction using function FdwXactRegisterForeignTransaction()
+ * in order to participate to a group for atomic commit. Participants are
+ * identified by oid of foreign server and user. When the foreign transaction
+ * begins to modify data the executor marks it as modified using
+ * FdwXactMarkForeignTransactionModified().
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. After committing or rolling back locally, we
+ * notify the resolver process and tell it to commit or roll back those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail when an ERROR
+ * after already committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * waiters each time we receive a request. We have two queues: the active
+ * queue and the retry queue. The backend is inserted to the active queue at
+ * first, and then it is moved to the retry queue by the resolver process if
+ * the resolution fails. The backends in the retry queue are processed at
+ * interval of foreign_transaction_resolution_retry_interval.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or more
+ * servers including itself. In other case, all foreign transactions are
+ * committed during pre-commit.
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. dangling
+ * transaction). Dangling transactions are processed by the resolve process
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * 	* On PREPARE redo we add the foreign transaction to FdwXactCtl->fdw_xacts.
+ *	  We set fdw_xact->inredo to true for such entries.
+ *	* On Checkpoint redo, we iterate through FdwXactCtl->fdw_xacts entries that
+ *	  have set fdw_xact->inredo true and are behind the redo_horizon. We save
+ *    them to disk and then set fdw_xact->ondisk to true.
+ *	* On COMMIT and ABORT we delete the entry from FdwXactCtl->fdw_xacts.
+ *	  If fdw_xact->ondisk is true, we delete the corresponding file from
+ *	  the disk as well.
+ *  * RecoverFdwXacts loads all foreign transaction entries from disk into
+ *    memory at server startup.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Is atomic commit requested by user? */
+#define IsAtomicCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+#define IsAtomicCommitRequested() \
+	(IsAtomicCommitEnabled() && \
+	 (distributed_atomic_commit > DISTRIBUTED_ATOMIC_COMMIT_DISABLED))
+
+/* Structure to bundle the foreign transaction participant */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in global entry. NULL if
+	 * this foreign transaction is registered but not inserted
+	 * yet.
+	 */
+	FdwXact		fdw_xact;
+	char		*fdw_xact_id;
+
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	bool		modified;					/* true if modified the data on server */
+	bool		twophase_commit_enabled;	/* true if the server can execute
+											 * two-phase commit protocol */
+	void			*fdw_state;				/* fdw-private state */
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact;
+	CommitForeignTransaction_function	commit_foreign_xact;
+	RollbackForeignTransaction_function	rollback_foreign_xact;
+	GetPrepareId_function				get_prepareid;
+	IsTwoPhaseCommitEnabled_function	is_twophase_commit_enabled;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit.
+ * This list has only foreign servers that support atomic commit FDW
+ * API regardless of their configuration.
+ */
+static List *FdwXactAtomicCommitParticipants = NIL;
+static bool FdwXactAtomicCommitReady = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDW_XACTS_DIR "pg_fdw_xact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDW_XACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDW_XACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactCommitForeignTransaction(FdwXactParticipant *fdw_part);
+static bool FdwXactResolveForeignTransaction(FdwXactState *state, FdwXact fdwxact,
+											 int elevel);
+static void FdwXactComputeRequiredXmin(void);
+static bool FdwXactAtomicCommitRequired(void);
+static void FdwXactQueueInsert(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid, bool give_warnings);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool give_warnings);
+static void register_fdw_xact(Oid serverid, Oid userid, bool modified);
+static List *get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						   bool need_lock);
+static FdwXact get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+								bool need_lock);
+static FdwXact get_all_fdw_xacts(int *length);
+static FdwXact insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							   Oid umid, char *fdw_xact_id);
+static char *generate_fdw_xact_identifier(TransactionId xid, Oid serverid, Oid userid);
+static void remove_fdw_xact(FdwXact fdw_xact);
+static FdwXactState *create_fdw_xact_state(void);
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int distributed_atomic_commit = DISTRIBUTED_ATOMIC_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/*
+ * Remember accessed foreign server. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation		rel;
+	Oid				serverid;
+	Oid				userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdw_xact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdw_xact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign server identified by given server id must support atomic
+ * commit APIs. Registered foreign transaction are managed by foreign
+ * transaction manager until the end of the transaction.
+ */
+static void
+register_fdw_xact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	MemoryContext		old_ctx;
+	FdwRoutine			*routine;
+	ListCell	   		*lc;
+
+	/*
+	 * Participants information is needed at the end of a transaction, where
+	 * system cache are not available. Save it in TopTransactionContext
+	 * beforehand so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * If the being modified foreign server doesn't have the atomic commit API
+	 * we don't manage the foreign transaction in the distributed transaction
+	 * manager.
+	 */
+	if (routine->IsTwoPhaseCommitEnabled == NULL)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	foreach(lc, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant	*fp = (FdwXactParticipant *) lfirst(lc);
+
+		if (fp->server->serverid == serverid &&
+			fp->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fp->modified |= modified;
+			pfree(routine);
+			return;
+		}
+	}
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	/* Make sure that the FDW has transaction handlers */
+	if (!routine->PrepareForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function provided for preparing foreign transaction for FDW %s",
+						fdw->fdwname)));
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to commit a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+	if (!routine->RollbackForeignTransaction)
+		ereport(ERROR,
+				(errmsg("no function to rollback a foreign transaction provided for FDW %s",
+						fdw->fdwname)));
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdw_xact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdw_xact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->twophase_commit_enabled = true; /* by default, will be changed at pre-commit phase */
+	fdw_part->fdw_state = NULL;
+	fdw_part->prepare_foreign_xact = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact = routine->RollbackForeignTransaction;
+	fdw_part->is_twophase_commit_enabled = routine->IsTwoPhaseCommitEnabled;
+	fdw_part->get_prepareid = routine->GetPrepareId;
+
+	/* Add this foreign transaction to the participants list */
+	FdwXactAtomicCommitParticipants = lappend(FdwXactAtomicCommitParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * FdwXactShmemSize
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdw_xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * FdwXactShmemInit
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdw_xacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->freeFdwXacts = NULL;
+		FdwXactCtl->numFdwXacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdw_xacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdw_xacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdw_xacts[cnt].status = FDW_XACT_INITIAL;
+			fdw_xacts[cnt].fxact_free_next = FdwXactCtl->freeFdwXacts;
+			FdwXactCtl->freeFdwXacts = &fdw_xacts[cnt];
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * PreCommit_FdwXacts
+ *
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_atomic_commit;
+	ListCell	*lc;
+	ListCell	*next;
+	ListCell	*prev = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	need_atomic_commit = FdwXactAtomicCommitRequired();
+
+	/*
+	 * If 'require' case, we require all modified server have to be capable of
+	 * two-phase commit protocol.
+	 */
+	if (need_atomic_commit &&
+		distributed_atomic_commit == DISTRIBUTED_ATOMIC_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Commit transactions on foreign servers.
+	 *
+	 * Committed transactions are removed from FdwXactAtomicCommitParticipants
+	 * so that the later preparation can process only servers that requires to be commit
+	 * using two-phase commit protocol.
+	 */
+	for (lc = list_head(FdwXactAtomicCommitParticipants); lc != NULL; lc = next)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool can_commit = false;
+
+		next = lnext(lc);
+
+		if (!need_atomic_commit || !fdw_part->modified)
+		{
+			/*
+			 * We can commit not-modified servers and when the atomic commit is not
+			 * required.
+			 */
+			can_commit = true;
+		}
+		else if (distributed_atomic_commit == DISTRIBUTED_ATOMIC_COMMIT_PREFER &&
+				 !fdw_part->twophase_commit_enabled)
+		{
+			/* Also in 'prefer' case, non-2pc-capable servers can be committed */
+			can_commit = true;
+		}
+
+		if (can_commit)
+		{
+			/* Commit the foreign transaction */
+			FdwXactCommitForeignTransaction(fdw_part);
+
+			/* Delete it from the participant list */
+			FdwXactAtomicCommitParticipants =
+				list_delete_cell(FdwXactAtomicCommitParticipants, lc, prev);
+
+			continue;
+		}
+
+		prev = lc;
+	}
+
+	/*
+	 * If only one participant of all participants is modified, we can commit it.
+	 * This can avoid to use two-phase commit for only one server in the 'prefer' case
+	 * where the transaction has one 2pc-capable modified server and some modified
+	 * servers.
+	 */
+	if (list_length(FdwXactAtomicCommitParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		Assert(distributed_atomic_commit == DISTRIBUTED_ATOMIC_COMMIT_PREFER);
+		FdwXactCommitForeignTransaction(linitial(FdwXactAtomicCommitParticipants));
+		list_free(FdwXactAtomicCommitParticipants);
+		return;
+	}
+
+	FdwXactPrepareForeignTransactions();
+	/* keep FdwXactparticipantsForAC until the end of transaction */
+}
+
+/*
+ * FdwXactPrepareForeignTransactions
+ *
+ * Prepare all foreign transaction participants.  This function creates a prepared
+ * participants chain each time when we prepared a foreign transaction. The prepared
+ * participants chain is used to access all participants of distributed transaction
+ * quickly. If any one of them fails to prepare, we change over aborts.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	FdwXactState *state;
+	ListCell   *lcell;
+	FdwXact		prev_fdwxact = NULL;
+	TransactionId txid;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	state = create_fdw_xact_state();
+
+	/* Loop over the foreign connections */
+	txid = GetTopTransactionId();
+	foreach(lcell, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXact		fdwxact;
+
+		/* Generate an unique identifier */
+		if (fdw_part->get_prepareid)
+		{
+			char *id;
+			int fdwxact_id_len = 0;
+
+			id = fdw_part->get_prepareid(txid, fdw_part->server->serverid,
+										 fdw_part->usermapping->userid,
+										 &fdwxact_id_len);
+
+			if (!id)
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_OBJECT),
+						 (errmsg("foreign transaction identifier is not provided"))));
+
+			/* Check length of foreign transaction identifier */
+			id[fdwxact_id_len] = '\0';
+			if (fdwxact_id_len > NAMEDATALEN)
+				ereport(ERROR,
+						(errcode(ERRCODE_NAME_TOO_LONG),
+						 errmsg("foreign transaction identifer \"%s\" is too long",
+								id),
+						 errdetail("foreign transaction identifier must be less than %d characters.",
+								   NAMEDATALEN)));
+
+			fdw_part->fdw_xact_id = pstrdup(id);
+		}
+		else
+			fdw_part->fdw_xact_id = generate_fdw_xact_identifier(txid,
+																 fdw_part->server->serverid,
+																 fdw_part->usermapping->userid);
+
+		/*
+		 * Insert the foreign transaction entry. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(txid, fdw_part);
+
+		state->serverid = fdw_part->server->serverid;
+		state->userid = fdw_part->usermapping->userid;
+		state->umid = fdw_part->usermapping->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		/*
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal). During abort processing,
+		 * we might try to resolve a never-prepared transaction, and get an error.
+		 * This is fine as long as the FDW provides us unique prepared transaction
+		 * identifiers.
+		 */
+		if (!fdw_part->prepare_foreign_xact(state))
+		{
+			/* Failed to prepare, change over aborts */
+			ereport(ERROR,
+					(errmsg("could not prepare transaction on foreign server %s",
+							fdw_part->server->servername)));
+		}
+
+		/* Keep fdw_state until end of transaction */
+		fdw_part->fdw_state = state->fdw_state;
+
+		/* Preparation is success, update its status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdw_part->fdw_xact->status = FDW_XACT_PREPARED;
+		fdw_part->fdw_xact = fdwxact;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Create a prepared participants chain, which is link-ed FdwXact entries
+		 * involving with this transaction.
+		 */
+		if (prev_fdwxact)
+		{
+			/* Append others to the tail */
+			Assert(fdwxact->fxact_next == NULL);
+			prev_fdwxact->fxact_next = fdwxact;
+		}
+	}
+}
+
+/*
+ * Commit the given foreign transaction.
+ */
+void
+FdwXactCommitForeignTransaction(FdwXactParticipant *fdw_part)
+{
+	FdwXactState *state;
+
+	state = create_fdw_xact_state();
+	state->serverid = fdw_part->server->serverid;
+	state->userid = fdw_part->usermapping->userid;
+	state->umid = fdw_part->usermapping->umid;
+	fdw_part->fdw_state = (void *) state;
+
+	if (!fdw_part->commit_foreign_xact(state))
+		ereport(ERROR,
+				(errmsg("could not commit foreign transaction on server %s",
+						fdw_part->server->servername)));
+}
+
+/*
+ * FdwXactInsertFdwXactEntry
+ *
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdw_xact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fxact;
+	FdwXactOnDiskData	*fxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact = insert_fdw_xact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdw_xact_id);
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->held_by = MyBackendId;
+	fdw_part->fdw_xact = fxact;
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdw_xact_id);
+	data_len = data_len + strlen(fdw_part->fdw_xact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fxact_file_data->dbid = MyDatabaseId;
+	fxact_file_data->local_xid = xid;
+	fxact_file_data->serverid = fdw_part->server->serverid;
+	fxact_file_data->userid = fdw_part->usermapping->userid;
+	fxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fxact_file_data->fdw_xact_id, fdw_part->fdw_xact_id,
+		   strlen(fdw_part->fdw_xact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fxact_file_data, data_len);
+	fxact->insert_end_lsn = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_INSERT);
+	XLogFlush(fxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fxact->valid = true;
+	LWLockRelease(FdwXactLock);
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fxact_file_data);
+	return fxact;
+}
+
+/*
+ * insert_fdw_xact
+ *
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdw_xact_id)
+{
+	int i;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		fxact = FdwXactCtl->fdw_xacts[i];
+		if (fxact->dbid == dbid &&
+			fxact->local_xid == xid &&
+			fxact->serverid == serverid &&
+			fxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->freeFdwXacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fxact = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fxact->fxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->numFdwXacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts++] = fxact;
+
+	fxact->held_by = InvalidBackendId;
+	fxact->dbid = dbid;
+	fxact->local_xid = xid;
+	fxact->serverid = serverid;
+	fxact->userid = userid;
+	fxact->umid = umid;
+	fxact->insert_start_lsn = InvalidXLogRecPtr;
+	fxact->insert_end_lsn = InvalidXLogRecPtr;
+	fxact->valid = false;
+	fxact->ondisk = false;
+	fxact->inredo = false;
+	memcpy(fxact->fdw_xact_id, fdw_xact_id, strlen(fdw_xact_id) + 1);
+
+	return fxact;
+}
+
+/*
+ * remove_fdw_xact
+ *
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdw_xact(FdwXact fdw_xact)
+{
+	int			cnt;
+
+	Assert(fdw_xact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		if (FdwXactCtl->fdw_xacts[cnt] == fdw_xact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (cnt >= FdwXactCtl->numFdwXacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdw_xact->local_xid, fdw_xact->serverid, fdw_xact->userid)));
+
+	/* Remove the entry from active array */
+	FdwXactCtl->numFdwXacts--;
+	FdwXactCtl->fdw_xacts[cnt] = FdwXactCtl->fdw_xacts[FdwXactCtl->numFdwXacts];
+
+	/* Put it back into free list */
+	fdw_xact->fxact_free_next = FdwXactCtl->freeFdwXacts;
+	FdwXactCtl->freeFdwXacts = fdw_xact;
+
+	/* Reset informations */
+	fdw_xact->status = FDW_XACT_INITIAL;
+	fdw_xact->held_by = InvalidBackendId;
+	fdw_xact->fxact_next = NULL;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdw_xact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdw_xact->serverid;
+		record.dbid = fdw_xact->dbid;
+		record.xid = fdw_xact->local_xid;
+		record.userid = fdw_xact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdw_xact_remove));
+		recptr = XLogInsert(RM_FDW_XACT_ID, XLOG_FDW_XACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if we require atomic commit.
+ * It is required if the transaction modified data on two or more servers including
+ * local node itself. This function also checks for each server if two-phase commit
+ * is enabled or not.
+ */
+static bool
+FdwXactAtomicCommitRequired(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsAtomicCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		/* Check if the foreign server is capable of two-phase commit protocol */
+		if (fdw_part->is_twophase_commit_enabled(fdw_part->server->serverid))
+			fdw_part->twophase_commit_enabled = true;
+		else if (fdw_part->modified)
+			MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/* Atomic commit is required if we modified data on two or more participants */
+	if (nserverswritten <= 1)
+		return false;
+
+	FdwXactAtomicCommitReady = true;
+	return true;
+}
+
+bool
+FdwXactIsAtomicCommitReady(void)
+{
+	return FdwXactAtomicCommitReady;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * ForgetAllFdwXactParticipants
+ *
+ * Reset all the foreign transaction entries that this backend registered.
+ * If the foreign transaction has the corresponding FdwXact entry, resetting
+ * the held_by field means to leave that entry in unresolved state. If we
+ * leaves any entries, we update the oldest xmin of unresolved transaction
+ * so that transaction status of dangling transaction are not truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(cell, FdwXactAtomicCommitParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+
+		/* Skip if didn't register FdwXact entry yet */
+		if (fdw_part->fdw_xact == NULL)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in
+		 * FdwXactAtomicCommitParticipants could be used by other backend before we
+		 * forget in case where the resolver process removes the FdwXact entry
+		 * and other backend reuses it before we forget. So we need to check
+		 * if the entries are still associated with the transaction.
+		 */
+		if (fdw_part->fdw_xact->held_by == MyBackendId)
+		{
+			fdw_part->fdw_xact->held_by = InvalidBackendId;
+			n_lefts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Update the oldest local transaction of unresolved distributed
+	 * transaction if we leaved any FdwXact entries.
+	 */
+	if (n_lefts > 0)
+		FdwXactComputeRequiredXmin();
+
+	FdwXactAtomicCommitParticipants = NIL;
+}
+
+/*
+ * AtProcExit_FdwXact
+ *
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDW_XACT_NOT_WAITING and then change
+ * that state to FDW_XACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDW_XACT_WAIT_COMPLETE once foreign transactions are resolved.
+ * This backend then resets its state to FDW_XACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue and changes the state to FDW_XACT_WAITING_RETRY.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+	ListCell	*lc;
+	List		*fdwxact_participants = NIL;
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsAtomicCommitRequested())
+		return;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDW_XACT_NOT_WAITING);
+
+	if (FdwXactAtomicCommitParticipants != NIL)
+	{
+		/*
+		 * If we're waiting for foreign transactions to be resolved that
+		 * we've prepared just before, use the participants list.
+		 */
+		Assert(MyPgXact->xid == wait_xid);
+		fdwxact_participants = FdwXactAtomicCommitParticipants;
+	}
+	else
+	{
+		/*
+		 * Get participants list from the global array. This is required (1)
+		 * when we're waiting for foreign transactions to be resolved that
+		 * is part of a local prepared transaction that is marked as prepared
+		 * during running, or (2) when we resolve the PREPARE'd distributed
+		 * transaction after restart.
+		 */
+		fdwxact_participants = get_fdw_xacts(MyDatabaseId, wait_xid,
+											 InvalidOid, InvalidOid, true);
+	}
+
+	/* Exit if we found no foreign transaction to resolve */
+	if (fdwxact_participants == NIL)
+		return;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	foreach(lc, fdwxact_participants)
+	{
+		FdwXact fdw_xact = (FdwXact) lfirst(lc);
+
+		/* Don't overwrite status if fate has been determined */
+		if (fdw_xact->status == FDW_XACT_PREPARED)
+			fdw_xact->status = (is_commit ?
+								FDW_XACT_COMMITTING_PREPARED :
+								FDW_XACT_ABORTING_PREPARED);
+	}
+
+	/* Set backend status and enqueue itself to the active queue*/
+	MyProc->fdwXactState = FDW_XACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	FdwXactQueueInsert();
+	LWLockRelease(FdwXactLock);
+
+	/* Launch a resolver process if not yet, or wake it up */
+	fdwxact_maybe_launch_resolver(false);
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDW_XACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDW_XACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDW_XACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+
+	/*
+	 * Forget the list of locked entries, also means that the entries
+	 * that could not resolved are remained as dangling transactions.
+	 */
+	ForgetAllFdwXactParticipants();
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Acquire FdwXactLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Insert MyProc into the tail of FdwXactActiveQueue.
+ */
+static void
+FdwXactQueueInsert(void)
+{
+	SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactActiveQueue),
+						 &(MyProc->fdwXactLinks));
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Resolve one distributed transaction. The target distributed transaction
+ * is fetched from either the active queue or the retry queue and its participants
+ * are fetched from either the global array.
+ *
+ * Release the waiter and return true if we resolved the all of the foreign
+ * transaction participants. On failure, we move the FdwXactLinks entry to the
+ * retry queue from the active queue, and raise an error and exit.
+ */
+bool
+FdwXactResolveDistributedTransaction(Oid dbid, bool is_active)
+{
+	FdwXactState	*state;
+	ListCell		*lc;
+	ListCell		*next;
+	PGPROC			*waiter = NULL;
+	List			*participants;
+	SHM_QUEUE		*target_queue;
+
+	if (is_active)
+		target_queue = &(FdwXactRslvCtl->FdwXactActiveQueue);
+	else
+		target_queue = &(FdwXactRslvCtl->FdwXactRetryQueue);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Fetch a waiter from beginning of the queue */
+	while ((waiter = (PGPROC *) SHMQueueNext(target_queue, target_queue,
+											 offsetof(PGPROC, fdwXactLinks))) != NULL)
+	{
+		/* Found a waiter */
+		if (waiter->databaseId == dbid)
+			break;
+	}
+
+	/* If no waiter, there is no job */
+	if (!waiter)
+	{
+		LWLockRelease(FdwXactLock);
+		return false;
+	}
+
+	Assert(TransactionIdIsValid(waiter->fdwXactWaitXid));
+
+	state = create_fdw_xact_state();
+	participants = get_fdw_xacts(dbid, waiter->fdwXactWaitXid, InvalidOid,
+								 InvalidOid, false);
+	LWLockRelease(FdwXactLock);
+
+	/* Resolve all foreign transactions one by one */
+	for (lc = list_head(participants); lc != NULL; lc = next)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(lc);
+
+		CHECK_FOR_INTERRUPTS();
+
+		next = lnext(lc);
+
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		PG_TRY();
+		{
+			FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+		}
+		PG_CATCH();
+		{
+			/* Re-insert the waiter to the retry queue */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDW_XACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				SHMQueueInsertBefore(&(FdwXactRslvCtl->FdwXactRetryQueue),
+									 &(waiter->fdwXactLinks));
+				waiter->fdwXactState = FDW_XACT_WAITING_RETRY;
+			}
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved a foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+	}
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDW_XACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc xid %u", wait_xid);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	return true;
+}
+
+/*
+ * Resolve all dangling foreign transactions on the given database. Get
+ * all dangling foreign transactions from shmem global array and resolve
+ * them one by one.
+ */
+void
+FdwXactResolveAllDanglingTransactions(Oid dbid)
+{
+	List		*dangling_fdwxacts = NIL;
+	ListCell	*cell;
+	bool		n_resolved = 0;
+	int			i;
+
+	Assert(OidIsValid(dbid));
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/*
+	 * Walk over the global array to make the list of dangling transactions
+	 * of which corresponding local transaction is on the given database.
+	 */
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fxact = FdwXactCtl->fdw_xacts[i];
+
+		/*
+		 * Append the fdwxact entry on the given database to the list if
+		 * it's handled by nobody and the corresponding local transaction
+		 * is not part of the prepared transaction.
+		 */
+		if (fxact->dbid == dbid &&
+			fxact->held_by == InvalidBackendId &&
+			!TwoPhaseExists(fxact->local_xid))
+			dangling_fdwxacts = lappend(dangling_fdwxacts, fxact);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/* Return if there is no foreign transaction we need to resolve */
+	if (dangling_fdwxacts == NIL)
+		return;
+
+	foreach(cell, dangling_fdwxacts)
+	{
+		FdwXact fdwxact = (FdwXact) lfirst(cell);
+		FdwXactState *state;
+
+		state = create_fdw_xact_state();
+		state->serverid = fdwxact->serverid;
+		state->userid = fdwxact->userid;
+		state->umid = fdwxact->umid;
+		state->fdwxact_id = pstrdup(fdwxact->fdw_xact_id);
+
+		FdwXactResolveForeignTransaction(state, fdwxact, ERROR);
+
+		n_resolved++;
+	}
+
+	list_free(dangling_fdwxacts);
+
+	elog(DEBUG2, "resolved %d dangling foreign xacts", n_resolved);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ *
+ * In commit case, we have already prepared transactions on the foreign
+ * servers during pre-commit. And that prepared transactions will be
+ * resolved by the resolver process. So we don't do anything about the
+ * foreign transaction.
+ *
+ * In abort case, user requested rollback or we changed over rollback
+ * due to error during commit. To close current foreign transaction anyway
+ * we call rollback API to every foreign transaction. If we raised an error
+ * during preparing and came to here, it's possible that some entries of
+ * FdwXactParticipants already registered its FdwXact entry. If there is
+ * we leave them as dangling transaction and ask the resolver process to
+ * process them.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		int left_fdwxacts = 0;
+		FdwXactState *state = create_fdw_xact_state();
+
+		foreach (lcell, FdwXactAtomicCommitParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * Count FdwXact entries that we registered to shared memory array
+			 * in this transaction.
+			 */
+			if (fdw_part->fdw_xact)
+			{
+				/*
+				 * The status of foreign transaction must be either preparing
+				 * or prepared. In any case, since we have registered FdwXact
+				 * entry we leave them to the resolver process. For the preparing
+				 * state, since the foreign transaction might not close yet we
+				 * fall through and call rollback API. For the prepared state,
+				 * since the foreign transaction has closed we don't need to do
+				 * anything.
+				 */
+				Assert(fdw_part->fdw_xact->status == FDW_XACT_PREPARING ||
+					   fdw_part->fdw_xact->status == FDW_XACT_PREPARED);
+
+				left_fdwxacts++;
+				if (fdw_part->fdw_xact->status == FDW_XACT_PREPARED)
+					continue;
+			}
+
+			state->serverid = fdw_part->server->serverid;
+			state->userid = fdw_part->usermapping->userid;
+			state->umid = fdw_part->usermapping->umid;
+			state->fdw_state = fdw_part->fdw_state;
+
+			/*
+			 * Rollback all current foreign transaction. Since we're rollbacking
+			 * the transaction it's too late even if we raise an error here.
+			 * So we log it as warning.
+			 */
+			if (!fdw_part->rollback_foreign_xact(state))
+				ereport(WARNING,
+						(errmsg("could not abort transaction on server \"%s\"",
+								fdw_part->server->servername)));
+		}
+
+		/* If we left some FdwXact entries, ask the resolver process */
+		if (left_fdwxacts > 0)
+		{
+			ereport(WARNING,
+					(errmsg("might have left %u foreign transactions in in-doubt status",
+							left_fdwxacts)));
+			fdwxact_maybe_launch_resolver(true);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+	FdwXactAtomicCommitReady = false;
+}
+
+/*
+ * AtPrepare_FdwXacts
+ *
+ * If there are foreign servers involved in the transaction, this function
+ * prepares transactions on those servers.
+ *
+ * Note that it can happen that the transaction aborts after we prepared part
+ * of participants. In this case since we can change to abort we cannot forget
+ * FdwXactAtomicCommitParticipants here. These are processed by the resolver process
+ * during aborting, or at EOXact_FdwXacts.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactAtomicCommitParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsAtomicCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when distributed_atomic_commit is \'disabled\'")));
+
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (FdwXactAtomicCommitRequired() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION),
+				 errmsg("can not prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * FdwXactResolveForeignTransaction
+ *
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * handler routine. The foreign transaction can be a dangling transaction
+ * that is not interested by nobody. If the fate of foreign transaction is
+ * not determined yet, it'sdetermined according to the status of corresponding
+ * local transaction.
+ *
+ * If the resolution is successful, remove the foreign transaction entry from
+ * the shared memory and also remove the corresponding on-disk file.
+ */
+static bool
+FdwXactResolveForeignTransaction(FdwXactState *state, FdwXact fdwxact,
+								 int elevel)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool		is_commit;
+	bool		ret;
+
+	Assert(fdwxact);
+
+	/*
+	 * Determine whether we commit or abort this foreign transaction.
+	 */
+	if (fdwxact->status == FDW_XACT_COMMITTING_PREPARED)
+		is_commit = true;
+	else if (fdwxact->status == FDW_XACT_ABORTING_PREPARED)
+		is_commit = false;
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	else if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_COMMITTING_PREPARED;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDW_XACT_ABORTING_PREPARED;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		is_commit = false;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction
+	 * state is neither committing or aborting. This should not
+	 * happen because we cannot determine to do commit or abort for
+	 * foreign transaction associated with the in-progress local
+	 * transaction.
+	 */
+	else
+		ereport(ERROR,
+				(errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid)));
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Resolve the foreign transaction */
+	Assert(fdw_routine->ResolveForeignTransaction);
+
+	ret = fdw_routine->ResolveForeignTransaction(state, is_commit);
+
+	if (!ret)
+	{
+		ereport(elevel,
+				(errmsg("could not %s a prepared foreign transaction on server \"%s\"",
+						is_commit ? "commit" : "rollback", server->servername),
+				 errdetail("local transaction id is %u, connected by user id %u",
+						   fdwxact->local_xid, fdwxact->userid)));
+	}
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdw_xact(fdwxact);
+	LWLockRelease(FdwXactLock);
+
+	return ret;
+}
+
+static FdwXactState *
+create_fdw_xact_state(void)
+{
+	FdwXactState *state;
+
+	state = palloc(sizeof(FdwXactState));
+	state->serverid = InvalidOid;
+	state->userid = InvalidOid;
+	state->umid = InvalidOid;
+	state->fdwxact_id = NULL;
+	state->fdw_state = NULL;
+
+	return state;
+}
+
+/*
+ * Return one FdwXact entry that matches to given arguments, otherwise
+ * return NULL. Since this function search FdwXact entry by unique key
+ * all arguments should be valid.
+ */
+static FdwXact
+get_one_fdw_xact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				 bool need_lock)
+{
+	List	*fdw_xact_list;
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, need_lock);
+
+	/* Could not find entry */
+	if (fdw_xact_list == NIL)
+		return NULL;
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdw_xact_list) == 1);
+
+	return (FdwXact) linitial(fdw_xact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdw_xact_exists(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdw_xact_list;
+
+	fdw_xact_list = get_fdw_xacts(dbid, xid, serverid, userid, true);
+
+	return fdw_xact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_prepared_fdw_xacts.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdw_xacts(int *length)
+{
+	List		*all_fdw_xacts;
+	ListCell	*lc;
+	FdwXact		fdw_xacts;
+	int			num_fdw_xacts = 0;
+
+	Assert(length != NULL);
+
+	/* Get all entries */
+	all_fdw_xacts = get_fdw_xacts(InvalidOid, InvalidTransactionId,
+								  InvalidOid, InvalidOid, true);
+
+	if (all_fdw_xacts == NIL)
+	{
+		*length = 0;
+		return NULL;
+	}
+
+	fdw_xacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdw_xacts));
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdw_xacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdw_xacts + num_fdw_xacts, fx,
+			   sizeof(FdwXactData));
+		num_fdw_xacts++;
+	}
+
+	*length = num_fdw_xacts;
+	list_free(all_fdw_xacts);
+
+	return fdw_xacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return
+ * NIL.
+ */
+static List*
+get_fdw_xacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			  bool need_lock)
+{
+	int i;
+	List	*fdw_xact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact	fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool	matches = true;
+
+		/* xid */
+		if (xid != InvalidTransactionId && xid != fdw_xact->local_xid)
+			matches = false;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdw_xact->dbid != dbid)
+			matches = false;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdw_xact->serverid)
+			matches = false;
+
+		/* userid */
+		if (OidIsValid(userid) && fdw_xact->userid != userid)
+			matches = false;
+
+		/* Append it if matched */
+		if (matches)
+			fdw_xact_list = lappend(fdw_xact_list, fdw_xact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdw_xact_list;
+}
+
+/*
+ * fdw_xact_redo
+ * Apply the redo log for a foreign transaction.
+ */
+void
+fdw_xact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDW_XACT_REMOVE)
+	{
+		xl_fdw_xact_remove *record = (xl_fdw_xact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier with in the form
+ * of "fx_<random number>_<xid>_<serverid>_<userid> whose length is always
+ * less than NAMEDATALEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+generate_fdw_xact_identifier(TransactionId xid, Oid serverid, Oid userid)
+{
+	char*	fdw_xact_id;
+
+	fdw_xact_id = (char *)palloc0(FDW_XACT_ID_MAX_LEN * sizeof(char));
+
+	snprintf(fdw_xact_id, FDW_XACT_ID_MAX_LEN, "%s_%ld_%u_%d_%d",
+			 "fx", Abs(random()), xid, serverid, userid);
+	fdw_xact_id[strlen(fdw_xact_id)] = '\0';
+
+	return fdw_xact_id;
+}
+
+/*
+ * CheckPointFdwXact
+ *
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * In order to avoid disk I/O while holding a light weight lock, the function
+ * first collects the files which need to be synced under FdwXactLock and then
+ * syncs them after releasing the lock. This approach creates a race condition:
+ * after releasing the lock, and before syncing a file, the corresponding
+ * foreign transaction entry and hence the file might get removed. The function
+ * checks whether that's true and ignores the error if so.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdw_xacts = 0;
+
+	/* Quick get-away, before taking lock */
+	if (max_prepared_foreign_xacts <= 0)
+		return;
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	/* Another quick, before we allocate memory */
+	if (FdwXactCtl->numFdwXacts <= 0)
+	{
+		LWLockRelease(FdwXactLock);
+		return;
+	}
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	for (cnt = 0; cnt < FdwXactCtl->numFdwXacts; cnt++)
+	{
+		FdwXact		fxact = FdwXactCtl->fdw_xacts[cnt];
+
+		if ((fxact->valid || fxact->inredo) &&
+			!fxact->ondisk &&
+			fxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fxact->dbid, fxact->local_xid,
+								fxact->serverid, fxact->userid,
+								buf, len);
+			fxact->ondisk = true;
+			fxact->insert_start_lsn = InvalidXLogRecPtr;
+			fxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdw_xacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDW_XACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdw_xacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdw_xacts,
+							 serialized_fdw_xacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDW_XACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDW_XACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * ProcessFdwXactBuffer
+ *
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid = ShmemVariableCache->nextXid;
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(insert_start_lsn != InvalidXLogRecPtr);
+
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid, true);
+		if (buf == NULL)
+		{
+			ereport(WARNING,
+					(errmsg("removing corrupt fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+			return NULL;
+		}
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return thecontents in
+ * a structure allocated in-memory. Otherwise return NULL. The structure can
+ * be later freed by the caller.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				bool give_warnings)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+	{
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not stat FDW transaction state file \"%s\": %m",
+							path)));
+		return NULL;
+	}
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdw_xact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+	{
+		CloseTransientFile(fd);
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+		return NULL;
+	}
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+	{
+		CloseTransientFile(fd);
+		return NULL;
+	}
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDW_XACT_FILE_READ);
+	if (read(fd, buf, stat.st_size) != stat.st_size)
+	{
+		pgstat_report_wait_end();
+		CloseTransientFile(fd);
+		if (give_warnings)
+			ereport(WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not read FDW transaction state file \"%s\": %m",
+					  path)));
+		return NULL;
+	}
+
+	pgstat_report_wait_end();
+	CloseTransientFile(fd);
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+	{
+		pfree(buf);
+		return NULL;
+	}
+
+	/* Check if the contents is an expected data */
+	fxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fxact_file_data->dbid  != dbid ||
+		fxact_file_data->serverid != serverid ||
+		fxact_file_data->userid != userid ||
+		fxact_file_data->local_xid != xid)
+	{
+		ereport(WARNING,
+			(errmsg("invalid foreign transaction state file \"%s\"",
+					path)));
+		CloseTransientFile(fd);
+		pfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/*
+ * PrescanFdwXacts
+ *
+ * Scan the all foreign transactions directory for oldest active transaction.
+ * This is run during database startup, after we completed reading WAL.
+ * ShmemVariableCache->nextXid has been set to one more than the highest XID
+ * for which evidence exists in WAL.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	TransactionId nextXid = ShmemVariableCache->nextXid;
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+		 strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			TransactionId local_xid;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/*
+			 * Remove a foreign prepared transaction file corresponding to an
+			 * XID, which is too new.
+			 */
+			if (TransactionIdFollowsOrEquals(local_xid, nextXid))
+			{
+				ereport(WARNING,
+						(errmsg("removing future foreign prepared transaction file \"%s\"",
+								clde->d_name)));
+				RemoveFdwXactFile(dbid, local_xid, serverid, userid, true);
+				continue;
+			}
+
+			if (TransactionIdPrecedesOrEquals(local_xid, oldestActiveXid))
+				oldestActiveXid = local_xid;
+		}
+	}
+
+	FreeDir(cldir);
+	return oldestActiveXid;
+}
+
+/*
+ * restoreFdwXactData
+ *
+ * Scan pg_fdw_xact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDW_XACTS_DIR);
+	while ((clde = ReadDir(cldir, FDW_XACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDW_XACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDW_XACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * FdwXactRedoAdd
+ *
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fxact = insert_fdw_xact(fxact_data->dbid, fxact_data->local_xid,
+							fxact_data->serverid, fxact_data->userid,
+							fxact_data->umid, fxact_data->fdw_xact_id);
+
+	/*
+	 * Set status as preparing, since we do not know the xact status
+	 * right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fxact->status = FDW_XACT_PREPARING;
+	fxact->insert_start_lsn = start_lsn;
+	fxact->insert_end_lsn = end_lsn;
+	fxact->inredo = true;	/* added in redo */
+	fxact->valid = false;
+	fxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * FdwXactRedoRemove
+ *
+ * Remove the corresponding fdw_xact entry from FdwXactCtl.
+ * Also remove fdw_xact file if a foreign transaction was saved
+ * via an earlier checkpoint.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdw_xact(dbid, xid, serverid, userid,
+							   false);
+
+	if (fdwxact == NULL)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdw_xact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdw_xacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_distributed_atomic_commit(int *newval, void **extra, GucSource source)
+{
+	DistributedAtomicCommitLevel newDistributedAtomicCommitLevel = *newval;
+
+		/* Parameter check */
+	if (newDistributedAtomicCommitLevel > DISTRIBUTED_ATOMIC_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"distributed_atomic_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdw_xacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_prepared_fdw_xacts(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdw_xacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdw_xacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(6, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdw_xacts = get_all_fdw_xacts(&num_fdw_xacts);
+		status->num_xacts = num_fdw_xacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdw_xact = &status->fdw_xacts[status->cur_xact++];
+		Datum		values[6];
+		bool		nulls[6];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdw_xact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdw_xact->dbid);
+		values[1] = TransactionIdGetDatum(fdw_xact->local_xid);
+		values[2] = ObjectIdGetDatum(fdw_xact->serverid);
+		values[3] = ObjectIdGetDatum(fdw_xact->userid);
+		switch (fdw_xact->status)
+		{
+			case FDW_XACT_PREPARING:
+				xact_status = "prepared";
+				break;
+			case FDW_XACT_COMMITTING_PREPARED:
+				xact_status = "committing";
+				break;
+			case FDW_XACT_ABORTING_PREPARED:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		/* should this be really interpreted by FDW */
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdw_xact->fdw_xact_id,
+															 strlen(fdw_xact->fdw_xact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXactState *state;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	bool			ret;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, true);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	usermapping = GetUserMapping(userid, serverid);
+
+	state = create_fdw_xact_state();
+	state->serverid = serverid;
+	state->userid = userid;
+	state->umid = usermapping->umid;
+
+	ret = FdwXactResolveForeignTransaction(state, fdwxact, LOG);
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_fdw_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdw_xact(MyDatabaseId, xid, serverid, userid, false);
+	if (fdwxact == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("could not find foreign transaction entry"))));
+
+	remove_fdw_xact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/access/fdwxact/fdwxact_launcher.c b/src/backend/access/fdwxact/fdwxact_launcher.c
new file mode 100644
index 0000000..39f351b
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_launcher.c
@@ -0,0 +1,641 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * There is a shared memory area where the information of resolver process
+ * is stored. Requesting of starting new resolver process by backend process
+ * is done via that shared memory area. Note that the launcher is assuming
+ * that there is no more than one starting request for a database.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launcher_sigusr2(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid, int slot);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+Datum pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS);
+
+/*
+ * Wake up the launcher process to retry launch. This is used by
+ * the resolver process is being stopped.
+ */
+void
+FdwXactLauncherWakeupToRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request resolution. This is
+ * used by the backend process.
+ */
+void
+FdwXactLauncherWakeupToRequest(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactActiveQueue));
+		SHMQueueInit(&(FdwXactRslvCtl->FdwXactRetryQueue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR1: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always try to start by the backend request.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			ResetLatch(MyLatch);
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher launch",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver worker
+ * if not running yet. A foreign transaction resolver worker is responsible
+ * for resolution of foreign transaction that are registered on a database.
+ * So if a resolver worker already is launched, we don't need to launch new
+ * one.
+ */
+void
+fdwxact_maybe_launch_resolver(bool ignore_error)
+{
+	FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->pid != InvalidPid &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * If we found the resolver for my database, we don't need to launch new
+	 * one but wake running worker up.
+	 */
+	if (found)
+	{
+		SetLatch(resolver->latch);
+
+		elog(DEBUG1, "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		return;
+	}
+
+	/* Looking for unused resolver slot */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	/*
+	 * However if there are no more free worker slots, inform user about it before
+	 * exiting.
+	 */
+	if (!found)
+	{
+		LWLockRelease(FdwXactResolverLock);
+
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+		return;
+	}
+
+	Assert(resolver->pid == InvalidPid);
+
+	/* Found a new resolver process */
+	resolver->dbid = MyDatabaseId;
+	resolver->in_use = true;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Wake up launcher */
+	FdwXactLauncherWakeupToRequest();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' at 'slot' if given. If slot is negative value we find an unused slot.
+ * Note that caller must hold FdwXactResolverLock in exclusive mode.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid, int slot)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int launch_slot = slot;
+
+	/* If slot number is invalid, we find an unused slot */
+	if (launch_slot < 0)
+	{
+		int i;
+
+		for (i = 0; i < max_foreign_xact_resolvers; i++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+			if (resolver->in_use && resolver->dbid == dbid)
+				return;
+
+			if (!resolver->in_use)
+			{
+				launch_slot = i;
+				break;
+			}
+		}
+	}
+
+	/* No unused found */
+	if (launch_slot < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[launch_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_main_arg = Int32GetDatum(launch_slot);
+	bgw.bgw_notify_pid = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch all foreign transaction resolvers that are required by backend process
+ * but not running. Return true if we launch any resolver.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	int i, j;
+	int num_launches = 0;
+	int num_unused_slots = 0;
+	int num_dbs = 0;
+	bool launched = false;
+	Oid	*dbs_to_launch;
+	Oid	*dbs_having_worker = palloc0(sizeof(Oid) * max_foreign_xact_resolvers);
+
+	/*
+	 * Launch resolver workers on the databases that are requested
+	 * by backend processes while looking unused slots.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* Remember unused worker slots */
+		if (!resolver->in_use)
+		{
+			num_unused_slots++;
+			continue;
+		}
+
+		/* Remember databases that are having a resolve worker, fall through */
+		if (OidIsValid(resolver->dbid))
+			dbs_having_worker[num_dbs++] = resolver->dbid;
+
+		/* Launch the backend-requested worker */
+		if (resolver->in_use &&
+			OidIsValid(resolver->dbid) &&
+			resolver->pid == InvalidPid)
+		{
+			fdwxact_launch_resolver(resolver->dbid, i);
+			launched = true;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* quick exit if no unused slot */
+	if (num_unused_slots == 0)
+		return launched;
+
+	/*
+	 * Launch the stopped resolver on the database that has unresolved
+	 * foreign transaction but doesn't have any resolver. Scanning
+	 * all FdwXact entries could take time but it's harmless for the
+	 * relaunch case.
+	 */
+	dbs_to_launch = (Oid *) palloc(sizeof(Oid) * num_unused_slots);
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->numFdwXacts; i++)
+	{
+		FdwXact fdw_xact = FdwXactCtl->fdw_xacts[i];
+		bool found = false;
+
+		/* unused slot is full */
+		if (num_launches > num_unused_slots)
+			break;
+
+		for (j = 0; j < num_dbs; j++)
+		{
+			if (dbs_having_worker[j] == fdw_xact->dbid)
+			{
+				found = true;
+				break;
+			}
+		}
+
+		/* Register the database if any resolvers aren't working on that */
+		if (!found)
+			dbs_to_launch[num_launches++] = fdw_xact->dbid;
+	}
+
+	/* Launch resolver process for a database at any worker slot */
+	for (i = 0; i < num_launches; i++)
+	{
+		fdwxact_launch_resolver(dbs_to_launch[i], -1);
+		launched = true;
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+
+/*
+ * Returns activity of foreign transaction resolvers, including pids, the number
+ * of tasks and the last resolution time.
+ */
+Datum
+pg_stat_get_fdwxact_resolver(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/fdwxact_resolver.c b/src/backend/access/fdwxact/fdwxact_resolver.c
new file mode 100644
index 0000000..0b754da
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact_resolver.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for every databases.
+ *
+ * A resolver process continues to resolve foreign transactions on a database
+ * It resolves two types of foreign transactions: on-line foreign transaction
+ * and dangling foreign transaction. The on-line foreign transaction is a
+ * foreign transaction that a concurrent backend process is waiting for
+ * resolution. The dangling transaction is a foreign transaction that corresponding
+ * distributed transaction ended up in in-doubt state. A resolver process
+ * doesn' exit as long as there is at least one unresolved foreign transaction
+ * on the database even if the timeout has come.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+
+//static MemoryContext ResolveContext = NULL;
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FdwXactRslvLoop(void);
+static long FdwXactRslvComputeSleepTime(TimestampTz now);
+static void FdwXactRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+	FdwXactLauncherWakeupToRetry();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	TIMESTAMP_NOBEGIN(MyFdwXactResolver->last_resolved_time);
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactRslvLoop(void)
+{
+	TimestampTz last_retry_time = 0;
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time;
+		bool		resolved;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve one distributed transaction */
+		StartTransactionCommand();
+		resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, true);
+		CommitTransactionCommand();
+
+		now = GetCurrentTimestamp();
+
+		/* Update my state */
+		if (resolved)
+			MyFdwXactResolver->last_resolved_time = now;
+
+		if (TimestampDifferenceExceeds(last_retry_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			StartTransactionCommand();
+			resolved = FdwXactResolveDistributedTransaction(MyDatabaseId, false);
+			CommitTransactionCommand();
+
+			last_retry_time = GetCurrentTimestamp();
+
+			/* Update my state */
+			if (resolved)
+				MyFdwXactResolver->last_resolved_time = last_retry_time;
+		}
+
+		/* Check for fdwxact resolver timeout */
+		FdwXactRslvCheckTimeout(now);
+
+		/*
+		 * If we have resolved any distributed transaction we go the next
+		 * without both resolving dangling transaction and sleeping because
+		 * there might be other on-line transactions waiting to be resolved.
+		 */
+		if (!resolved)
+		{
+			/* Resolve dangling transactions as mush as possible */
+			StartTransactionCommand();
+			FdwXactResolveAllDanglingTransactions(MyDatabaseId);
+			CommitTransactionCommand();
+
+			sleep_time = FdwXactRslvComputeSleepTime(now);
+
+			MemoryContextResetAndDeleteChildren(resolver_ctx);
+			MemoryContextSwitchTo(TopMemoryContext);
+
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   sleep_time,
+						   WAIT_EVENT_FDW_XACT_RESOLVER_MAIN);
+
+			if (rc & WL_POSTMASTER_DEATH)
+				proc_exit(1);
+		}
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/*
+	 * Reached to the timeout. We exit if there is no more both pending on-line
+	 * transactions and dangling transactions.
+	 */
+	if (!fdw_xact_exists(InvalidTransactionId, MyDatabaseId, InvalidOid,
+						 InvalidOid))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyFdwXactResolver->dbid))));
+		CommitTransactionCommand();
+
+		fdwxact_resolver_detach();
+		proc_exit(0);
+	}
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. Return the sleep time
+ * in milliseconds, -1 means that we reached to the timeout and should exits
+ */
+static long
+FdwXactRslvComputeSleepTime(TimestampTz now)
+{
+	static TimestampTz	wakeuptime = 0;
+	long	sleeptime;
+	long	sec_to_timeout;
+	int		microsec_to_timeout;
+
+	if (now >= wakeuptime)
+		wakeuptime = TimestampTzPlusMilliseconds(now,
+												 foreign_xact_resolution_retry_interval);
+
+	/* Compute relative time until wakeup. */
+	TimestampDifference(now, wakeuptime,
+						&sec_to_timeout, &microsec_to_timeout);
+
+	sleeptime = sec_to_timeout * 1000 + microsec_to_timeout / 1000;
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c2db19b..fb63471 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2629,10 +2629,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		heap_freetuple(heaptup);
 	}
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	return HeapTupleGetOid(tup);
 }
 
@@ -3457,10 +3453,6 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	return HeapTupleMayBeUpdated;
 }
 
@@ -4411,10 +4403,6 @@ l2:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
-	/* Make note that we've wrote on non-temprary relation */
-	if (RelationNeedsWAL(relation))
-		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
-
 	bms_free(hot_attrs);
 	bms_free(proj_idx_attrs);
 	bms_free(key_attrs);
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..7061bba
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdw_xactdesc.c
+ *		PostgreSQL distributed transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdw_xactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdw_xact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDW_XACT_INSERT)
+	{
+		FdwXactOnDiskData *fdw_insert_xlog = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_insert_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_insert_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_insert_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_insert_xlog->local_xid);
+		/* TODO: This should be really interpreted by each FDW */
+
+		/*
+		 * TODO: we also need to assess whether we want to add this
+		 * information
+		 */
+		appendStringInfo(buf, " foreign transaction info: %s",
+						 fdw_insert_xlog->fdw_xact_id);
+	}
+	else
+	{
+		xl_fdw_xact_remove *fdw_remove_xlog = (xl_fdw_xact_remove *) rec;
+
+		appendStringInfo(buf, "Foreign server oid: %u", fdw_remove_xlog->serverid);
+		appendStringInfo(buf, " user oid: %u", fdw_remove_xlog->userid);
+		appendStringInfo(buf, " database id: %u", fdw_remove_xlog->dbid);
+		appendStringInfo(buf, " local xid: %u", fdw_remove_xlog->xid);
+	}
+
+}
+
+const char *
+fdw_xact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDW_XACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDW_XACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7..4a9ab3d 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,14 +112,16 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_prepared_xacts=%d max_locks_per_xact=%d "
 						 "wal_level=%s wal_log_hints=%s "
-						 "track_commit_timestamp=%s",
+						 "track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_prepared_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3942734..bc4e109 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -844,6 +845,35 @@ TwoPhaseGetGXact(TransactionId xid)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyProc
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2316,6 +2346,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2375,6 +2411,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d967400..1d06e0a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1131,6 +1132,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1139,6 +1141,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsAtomicCommitReady();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1177,12 +1180,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1340,6 +1344,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -1994,6 +2006,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2150,6 +2165,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2237,6 +2253,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2426,6 +2444,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2631,6 +2650,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7eed586..cce4fd4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5250,6 +5251,7 @@ BootStrapXLOG(void)
 	ControlFile->MaxConnections = MaxConnections;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6337,6 +6339,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6861,14 +6866,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdw_xact, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7060,7 +7066,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7566,6 +7575,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7884,6 +7894,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9200,6 +9213,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9633,7 +9647,8 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9665,6 +9680,7 @@ XLogReportParameters(void)
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9870,6 +9886,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10068,6 +10085,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->MaxConnections = xlrec.MaxConnections;
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 715995d..1b9cdbb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_prepared_fdw_xacts AS
+       SELECT * FROM pg_prepared_fdw_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
 	l.objoid, l.classoid, l.objsubid,
@@ -773,6 +776,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_fdwxact_resolvers AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_fdwxact_resolver() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f..4d4c339 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2504,9 +2504,16 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	}
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index e5dd995..6056feb 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
@@ -1093,6 +1094,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1407,6 +1420,16 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdw_xact_exists(InvalidTransactionId, MyDatabaseId, srv->serverid,
+						useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
@@ -1559,6 +1582,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 1e72e9f..2fee05d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
@@ -749,7 +750,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	MemoryContextSwitchTo(oldContext);
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 5d2cd0e..71c9916 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,10 +226,33 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
+	}
+
 	return scanstate;
 }
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e2836b7..2557568 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "commands/trigger.h"
@@ -44,6 +45,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/bufmgr.h"
@@ -485,6 +487,10 @@ ExecInsert(ModifyTableState *mtstate,
 								HEAP_INSERT_SPECULATIVE,
 								NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
 												   estate, true, &specConflict,
@@ -530,6 +536,10 @@ ExecInsert(ModifyTableState *mtstate,
 								estate->es_output_cid,
 								0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
@@ -722,6 +732,11 @@ ldelete:;
 							 true /* wait for commit */ ,
 							 &hufd,
 							 changingPart);
+
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -1210,6 +1225,11 @@ lreplace:;
 							 estate->es_crosscheck_snapshot,
 							 true /* wait for commit */ ,
 							 &hufd, &lockmode);
+
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -2315,6 +2335,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index a0bcc04..b2097ad 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -155,6 +155,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d2b695e..b722b9a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 42bccce..5116369 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3484,6 +3484,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3678,6 +3684,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDW_XACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3893,6 +3902,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDW_XACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDW_XACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index cb49f32..d9faef0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -896,6 +898,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -971,12 +977,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afb4972..960fd6a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDW_XACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a58..c5610ee 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -150,6 +152,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, BackendRandomShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +274,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	BackendRandomShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index dc7e875..48bb87a 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -91,6 +91,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -246,6 +248,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1327,6 +1330,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1392,6 +1396,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1442,6 +1447,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDW_XACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3030,6 +3044,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..a42d06e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,5 @@ OldSnapshotTimeMapLock				42
 BackendRandomLock					43
 LogicalRepWorkerLock				44
 CLogTruncationLock					45
+FdwXactLock					46
+FdwXactResolverLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6ad5044..6e7b3b8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -405,6 +406,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDW_XACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -806,6 +811,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a3b9757..48f3c59 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -2994,6 +2996,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0327b29..ffdc166 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/transam.h"
@@ -377,6 +378,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry distributed_atomic_commit_options[] = {
+	{"required", DISTRIBUTED_ATOMIC_COMMIT_REQUIRED, false},
+	{"prefer", DISTRIBUTED_ATOMIC_COMMIT_PREFER, false},
+	{"disabled", DISTRIBUTED_ATOMIC_COMMIT_DISABLED, false},
+	{"on", DISTRIBUTED_ATOMIC_COMMIT_REQUIRED, false},
+	{"off", DISTRIBUTED_ATOMIC_COMMIT_DISABLED, false},
+	{"true", DISTRIBUTED_ATOMIC_COMMIT_REQUIRED, true},
+	{"false", DISTRIBUTED_ATOMIC_COMMIT_DISABLED, true},
+	{"yes", DISTRIBUTED_ATOMIC_COMMIT_REQUIRED, true},
+	{"no", DISTRIBUTED_ATOMIC_COMMIT_DISABLED, true},
+	{"1", DISTRIBUTED_ATOMIC_COMMIT_REQUIRED, true},
+	{"0", DISTRIBUTED_ATOMIC_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
+/*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
  */
@@ -658,6 +678,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2234,6 +2258,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, RESOURCES_ASYNCHRONOUS,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4054,6 +4124,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"distributed_atomic_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of distributed atomic commit for the current transaction."),
+			NULL
+		},
+		&distributed_atomic_commit,
+		DISTRIBUTED_ATOMIC_COMMIT_DISABLED, distributed_atomic_commit_options,
+		check_distributed_atomic_commit, NULL, NULL
+	},
+
+	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3fe257c..88387ed 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -121,6 +121,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -287,6 +289,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index ad06e8e..ca3eb62 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ab5cb7f..609578c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -209,6 +209,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdw_xact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 895a51f..7df88e0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -306,6 +306,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_worker_processes);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 6fb403a..6d867c8 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -730,6 +730,7 @@ GuessControlValues(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -957,6 +958,7 @@ RewriteControlFile(void)
 	ControlFile.MaxConnections = 100;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* Contents are protected with a CRC */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000..b4de88b
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,149 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL distributed transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDW_XACT_H
+#define FDW_XACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+#define	FDW_XACT_NOT_WAITING		0
+#define	FDW_XACT_WAITING			1
+#define	FDW_XACT_WAITING_RETRY		2
+#define	FDW_XACT_WAIT_COMPLETE		3
+
+#define FdwXactEnabled() (max_prepared_foreign_xacts > 0)
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDW_XACT_ID_MAX_LEN 200
+
+/* Enum to track the status of prepared foreign transaction */
+typedef enum
+{
+	FDW_XACT_INITIAL,
+	FDW_XACT_PREPARING,					/* foreign transaction is being prepared */
+	FDW_XACT_PREPARED,					/* foreign transaction is prepared */
+	FDW_XACT_COMMITTING_PREPARED,		/* foreign prepared transaction is to
+										 * be committed */
+	FDW_XACT_ABORTING_PREPARED, /* foreign prepared transaction is to be
+								 * aborted */
+} FdwXactStatus;
+
+
+/* Enum for distributed_atomic_commit parameter */
+typedef enum
+{
+	DISTRIBUTED_ATOMIC_COMMIT_DISABLED,	/* disable distributed atomic commit */
+	DISTRIBUTED_ATOMIC_COMMIT_PREFER,	/* use twophase commit where available */
+	DISTRIBUTED_ATOMIC_COMMIT_REQUIRED	/* all foreign servers have to support twophase
+										 * commit */
+} DistributedAtomicCommitLevel;
+
+/* Shared memory entry for a prepared or being prepared foreign transaction */
+typedef struct FdwXactData *FdwXact;
+
+typedef struct FdwXactData
+{
+	FdwXact		fxact_free_next;	/* Next free FdwXact entry */
+	FdwXact		fxact_next;			/* Pointer to the neext FdwXact entry accosiated
+									 * with the same transaction */
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes place */
+	Oid				userid;			/* user who initiated the foreign transaction */
+	Oid				umid;
+	FdwXactStatus 	status;			/* The state of the foreign transaction. This
+									 * doubles as the action to be taken on this entry. */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/* Shared memory layout for maintaining foreign prepared transaction entries. */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		freeFdwXacts;
+
+	/* Number of valid foreign transaction entries */
+	int			numFdwXacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdw_xacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+typedef struct FdwXactState
+{
+	Oid		serverid;
+	Oid		userid;
+	Oid		umid;
+	char	*fdwxact_id;
+	void	*fdw_state;		/* foreign-data wrapper can keep state here */
+} FdwXactState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	distributed_atomic_commit;
+
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdw_xact_exists(TransactionId xid, Oid dboid, Oid serverid,
+				Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactResolveDistributedTransaction(Oid dbid, bool is_active);
+extern void FdwXactResolveAllDanglingTransactions(Oid dbid);
+extern bool FdwXactIsAtomicCommitReady(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified);
+extern bool check_distributed_atomic_commit(int *newval, void **extra,
+										  GucSource source);
+
+#endif   /* FDW_XACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000..4ea65b2
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,32 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _FDWXACT_LAUNCHER_H
+#define _FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherWakeupToRequest(void);
+extern void FdwXactLauncherWakeupToRetry(void);
+
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+
+extern bool IsFdwXactLauncher(void);
+
+extern void fdwxact_maybe_launch_resolver(bool ignore_error);
+
+
+#endif	/* _FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000..6b2a24f
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000..e92b5a1
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDW_XACT_INSERT	0x00
+#define XLOG_FDW_XACT_REMOVE	0x10
+
+/* Same as GIDSIZE */
+#define FDW_XACT_MAX_ID_LEN 200
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdw_xact_id[FDW_XACT_MAX_ID_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdw_xact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+} xl_fdw_xact_remove;
+
+extern void fdw_xact_redo(XLogReaderState *record);
+extern void fdw_xact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdw_xact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000..36391d4
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,67 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef _RESOLVER_INTERNAL_H
+#define _RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/*
+	 * Foreign transaction resolution queues. Protected by FdwXactLock.
+	 */
+	SHM_QUEUE	FdwXactActiveQueue;
+	SHM_QUEUE	FdwXactRetryQueue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* _RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 0bbe9879..c15dff7 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDW_XACT_ID, "Foreign Transactions", fdw_xact_redo, fdw_xact_desc, fdw_xact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 0e932da..b199c88 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 2c1b2d8..63c833d 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -105,6 +105,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE				(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 30610b3..795e85a 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -227,6 +227,7 @@ typedef struct xl_parameter_change
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6..3d5333a 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -178,6 +178,7 @@ typedef struct ControlFileData
 	int			MaxConnections;
 	int			max_worker_processes;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9264a2e..ee68caa 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5036,6 +5036,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_fdwxact_resolver', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_fdwxact_resolver' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5741,6 +5748,22 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_prepared_fdw_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{dbid,transaction,serverid,userid,status,identifier}',
+  prosrc => 'pg_prepared_fdw_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction',
+  proname => 'pg_remove_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_remove_fdw_xact' },
+{ oid => '6052', descr => 'resolve foreign transaction',
+  proname => 'pg_resolve_fdw_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  prosrc => 'pg_resolve_fdw_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb54..92d47bb 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
@@ -168,6 +169,14 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef bool (*PrepareForeignTransaction_function) (FdwXactState *state);
+typedef bool (*CommitForeignTransaction_function) (FdwXactState *state);
+typedef bool (*RollbackForeignTransaction_function) (FdwXactState *state);
+typedef bool (*ResolveForeignTransaction_function) (FdwXactState *state,
+													bool is_commit);
+typedef bool (*IsTwoPhaseCommitEnabled_function) (Oid serverid);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -235,6 +244,14 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for distributed transactions */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	ResolveForeignTransaction_function ResolveForeignTransaction;
+	IsTwoPhaseCommitEnabled_function IsTwoPhaseCommitEnabled;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -247,7 +264,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 3ca12e6..d030368 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -68,10 +68,10 @@ typedef struct ForeignTable
 	List	   *options;		/* ftoptions as DefElem list */
 } ForeignTable;
 
-
 extern ForeignServer *GetForeignServer(Oid serverid);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
 							bool missing_ok);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f1c10d1..05feb0a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,6 +759,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDW_XACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDW_XACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -833,7 +835,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDW_XACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -913,6 +916,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_READ,
+	WAIT_EVENT_FDW_XACT_FILE_WRITE,
+	WAIT_EVENT_FDW_XACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index cb613c8..45880b2 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -153,6 +153,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction
+								 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 75bab29..25d6a2f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDW_XACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9ef..81560bd 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -94,6 +94,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 735dd37..fdd6ded 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1413,6 +1413,13 @@ pg_policies| SELECT n.nspname AS schemaname,
    FROM ((pg_policy pol
      JOIN pg_class c ON ((c.oid = pol.polrelid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)));
+pg_prepared_fdw_xacts| SELECT f.dbid,
+    f.transaction,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.identifier
+   FROM pg_prepared_fdw_xacts() f(dbid, transaction, serverid, userid, status, identifier);
 pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
@@ -1821,6 +1828,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_fdwxact_resolvers| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_fdwxact_resolver() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_progress_vacuum| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.10.5

#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#15)

On Thu, Nov 15, 2018 at 7:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

# It took a long time to come here..

At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

...

* Updated docs, added the new section "Distributed Transaction" at
Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact directory.

* Some bug fixes.

Please reivew them.

I have some comments, with apologize in advance for possible
duplicate or conflict with others' comments so far.

Thank youf so much for reviewing this patch!

0001:

This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
relation is modified. Isn't it needed when UNLOGGED tables are
modified? It may be better that we have dedicated classification
macro or function.

I think even if we do atomic commit for modifying the an UNLOGGED
table and a remote table the data will get inconsistent if the local
server crashes. For example, if the local server crashes after
prepared the transaction on foreign server but before the local commit
and, we will lose the all data of the local UNLOGGED table whereas the
modification of remote table is rollbacked. In case of persistent
tables, the data consistency is left. So I think the keeping data
consistency between remote data and local unlogged table is difficult
and want to leave it as a restriction for now. Am I missing something?

The flag is handled in heapam.c. I suppose that it should be done
in the upper layer considering coming pluggable storage.
(X_F_ACCESSEDTEMPREL is set in heapam, but..)

Yeah, or we can set the flag after heap_insert in ExecInsert.

0002:

The name FdwXactParticipantsForAC doesn't sound good for me. How
about FdwXactAtomicCommitPartitcipants?

+1, will fix it.

Well, as the file comment of fdwxact.c,
FdwXactRegisterTransaction is called from FDW driver and
F_X_MarkForeignTransactionModified is called from executor. I
think that we should clarify who is responsible to the whole
sequence. Since the state of local tables affects, I suppose
executor is that. Couldn't we do the whole thing within executor
side? I'm not sure but I feel that
F_X_RegisterForeignTransaction can be a part of
F_X_MarkForeignTransactionModified. The callers of
MarkForeignTransactionModified can find whether the table is
involved in 2pc by IsTwoPhaseCommitEnabled interface.

Indeed. We can register foreign servers by executor while FDWs don't
need to register anything. I will remove the registration function so
that FDW developers don't need to call the register function but only
need to provide atomic commit APIs.

if (foreign_twophase_commit == true &&
((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));

The error is emitted when a the GUC is turned off in the
trasaction where MarkTransactionModify'ed. I think that the
number of the variables' possible states should be reduced for
simplicity. For example in the case, once foreign_twopase_commit
is checked in a transaction, subsequent changes in the
transaction should be ignored during the transaction.

I might have not gotten your comment correctly but since the
foreign_twophase_commit is a PGC_USERSET parameter I think we need to
check it at commit time. Also we need to keep participant servers even
when foreign_twophase_commit is off if both max_prepared_foreign_xacts
and max_foreign_xact_resolvers are > 0.

I will post the updated patch in this week.

Attached the updated version patches.

Based on the review comment from Horiguchi-san, I've changed the
atomic commit API so that the FDW developer who wish to support atomic
commit don't need to call the register function. The atomic commit
APIs are following:

* GetPrepareId
* PrepareForeignTransaction
* CommitForeignTransaction
* RollbackForeignTransaction
* ResolveForeignTransaction
* IsTwophaseCommitEnabled

The all APIs except for GetPreapreId is required for atomic commit.

Also, I've changed the foreign_twophase_commit parameter to an enum
parameter based on the suggestion from Robert[1]. Valid values are
'required', 'prefer' and 'disabled' (default). When set to either
'required' or 'prefer' the atomic commit will be used. The difference
between 'required' and 'prefer' is that when set to 'requried' we
require for *all* modified server to be able to use 2pc whereas when
'prefer' we require 2pc where available. So if any of written
participants disables 2pc or doesn't support atomic comit API the
transaction fails. IOW, when 'required' we can commit only when data
consistency among all participant can be left.

Please review the patches.

Since the previous patch conflicts with current HEAD attached updated
set of patches.

Rebased and fixed a few bugs.

I got feedbacks regarding transaciton management FDW APIs at Japan
PostgreSQL Developer Meetup[1]https://wiki.postgresql.org/wiki/Japan_PostgreSQL_Developer_Meetup and am considering to change these APIs
to make them consistent with XA interface[2]https://en.wikipedia.org/wiki/X/Open_XA (xa_prepare(),
xa_commit() and xa_rollback()) as follows[3]The current API design I'm proposing has 6 APIs: Prepare, Commit, Rollback, Resolve, IsTwoPhaseEnabled and GetPrepareId. And these APIs are devided based on who executes it..

* FdwXactResult PrepareForeignTransaction(FdwXactState *state, inf flags)
* FdwXactResult CommitForeignTransaction(FdwXactState *state, inf flags)
* FdwXactResult RollbackForeignTransaction(FdwXactState *state, inf flags)
* char *GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int
*prep_id_len)

Where flags set variaous setttings, currently it would contain only
FDW_XACT_FLAG_ONEPHASE that requires FDW to commit in one-phase (i.e.
without preparation). And where *state would contains information
necessary for specifying transaction: serverid, userid, usermappingid
and prepared id. GetPrepareId API is optional. Also I've removed the
two_phase_commit parameter from postgres_fdw options because we can
disable to use two-phase commit protocol for distributed transactions
using by distributed_atomic_commit GUC parameter.

Foreign transactions whose FDW provides both CommitForeignTransaction
API and RollbackForeignTransaction API will be managed by the global
transaction manager automatically. In addition, if the FDW also
provide PrepareForeignTransaction API it will participate to two-phase
commit protocol as a participant. So the existing FDWs that don't
provide transaction management FDW APIs can continue to work as before
even though this patch get committed.

The one point I'm concerned about this API design would be that since
both CommitForeignTransaction API and RollbackForeignTransaction API
will be used by two different kinds of process (backend and
transaction resolver processes), it might be hard to understand them
correctly for FDW developers.

I'd like to define new APIs so that FDW developers don't get confused.
Feedback is very welcome.

[1]: https://wiki.postgresql.org/wiki/Japan_PostgreSQL_Developer_Meetup
[2]: https://en.wikipedia.org/wiki/X/Open_XA
[3]: The current API design I'm proposing has 6 APIs: Prepare, Commit, Rollback, Resolve, IsTwoPhaseEnabled and GetPrepareId. And these APIs are devided based on who executes it.
Rollback, Resolve, IsTwoPhaseEnabled and GetPrepareId. And these APIs
are devided based on who executes it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Ildar Musin
ildar@adjust.com
In reply to: Masahiko Sawada (#16)

Hello,

The patch needs rebase as it doesn't apply to the current master. I applied
it
to the older commit to test it. It worked fine so far.

I found one bug though which would cause resolver to finish by timeout even
though there are unresolved foreign transactions in the list. The
`fdw_xact_exists()` function expects database id as the first argument and
xid
as the second. But everywhere it is called arguments specified in the
different
order (xid first, then dbid). Also function declaration in header doesn't
match its definition.

There are some other things I found.
* In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is
declared as bool but used as integer.
* In fdwxact.c's module comment there are
`FdwXactRegisterForeignTransaction()`
and `FdwXactMarkForeignTransactionModified()` functions mentioned that are
not there anymore.
* In documentation (storage.sgml) there is no mention of `pg_fdw_xact`
directory.

Couple of stylistic notes.
* In `FdwXactCtlData struct` there are both camel case and snake case naming
used.
* In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with
`TransactionIdIsValid(xid)`.
* In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of
format
string instead of being processed by `sprintf` as an extra argument.

I'll continue looking into the patch. Thanks!

On Tue, Nov 20, 2018 at 12:18 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Show quoted text

On Thu, Nov 15, 2018 at 7:36 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <

sawada.mshk@gmail.com> wrote:

On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <

sawada.mshk@gmail.com> wrote:

On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

# It took a long time to come here..

At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <

sawada.mshk@gmail.com> wrote in
<CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>

On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <

sawada.mshk@gmail.com> wrote:

...

* Updated docs, added the new section "Distributed

Transaction" at

Chapter 33 to explain the concept to users

* Moved atomic commit codes into src/backend/access/fdwxact

directory.

* Some bug fixes.

Please reivew them.

I have some comments, with apologize in advance for possible
duplicate or conflict with others' comments so far.

Thank youf so much for reviewing this patch!

0001:

This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
relation is modified. Isn't it needed when UNLOGGED tables are
modified? It may be better that we have dedicated classification
macro or function.

I think even if we do atomic commit for modifying the an UNLOGGED
table and a remote table the data will get inconsistent if the

local

server crashes. For example, if the local server crashes after
prepared the transaction on foreign server but before the local

commit

and, we will lose the all data of the local UNLOGGED table whereas

the

modification of remote table is rollbacked. In case of persistent
tables, the data consistency is left. So I think the keeping data
consistency between remote data and local unlogged table is

difficult

and want to leave it as a restriction for now. Am I missing

something?

The flag is handled in heapam.c. I suppose that it should be done
in the upper layer considering coming pluggable storage.
(X_F_ACCESSEDTEMPREL is set in heapam, but..)

Yeah, or we can set the flag after heap_insert in ExecInsert.

0002:

The name FdwXactParticipantsForAC doesn't sound good for me. How
about FdwXactAtomicCommitPartitcipants?

+1, will fix it.

Well, as the file comment of fdwxact.c,
FdwXactRegisterTransaction is called from FDW driver and
F_X_MarkForeignTransactionModified is called from executor. I
think that we should clarify who is responsible to the whole
sequence. Since the state of local tables affects, I suppose
executor is that. Couldn't we do the whole thing within executor
side? I'm not sure but I feel that
F_X_RegisterForeignTransaction can be a part of
F_X_MarkForeignTransactionModified. The callers of
MarkForeignTransactionModified can find whether the table is
involved in 2pc by IsTwoPhaseCommitEnabled interface.

Indeed. We can register foreign servers by executor while FDWs

don't

need to register anything. I will remove the registration function

so

that FDW developers don't need to call the register function but

only

need to provide atomic commit APIs.

if (foreign_twophase_commit == true &&
((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
ereport(ERROR,

(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

errmsg("cannot COMMIT a

distributed transaction that has operated on foreign server that doesn't
support atomic commit")));

The error is emitted when a the GUC is turned off in the
trasaction where MarkTransactionModify'ed. I think that the
number of the variables' possible states should be reduced for
simplicity. For example in the case, once foreign_twopase_commit
is checked in a transaction, subsequent changes in the
transaction should be ignored during the transaction.

I might have not gotten your comment correctly but since the
foreign_twophase_commit is a PGC_USERSET parameter I think we need

to

check it at commit time. Also we need to keep participant servers

even

when foreign_twophase_commit is off if both

max_prepared_foreign_xacts

and max_foreign_xact_resolvers are > 0.

I will post the updated patch in this week.

Attached the updated version patches.

Based on the review comment from Horiguchi-san, I've changed the
atomic commit API so that the FDW developer who wish to support

atomic

commit don't need to call the register function. The atomic commit
APIs are following:

* GetPrepareId
* PrepareForeignTransaction
* CommitForeignTransaction
* RollbackForeignTransaction
* ResolveForeignTransaction
* IsTwophaseCommitEnabled

The all APIs except for GetPreapreId is required for atomic commit.

Also, I've changed the foreign_twophase_commit parameter to an enum
parameter based on the suggestion from Robert[1]. Valid values are
'required', 'prefer' and 'disabled' (default). When set to either
'required' or 'prefer' the atomic commit will be used. The difference
between 'required' and 'prefer' is that when set to 'requried' we
require for *all* modified server to be able to use 2pc whereas when
'prefer' we require 2pc where available. So if any of written
participants disables 2pc or doesn't support atomic comit API the
transaction fails. IOW, when 'required' we can commit only when data
consistency among all participant can be left.

Please review the patches.

Since the previous patch conflicts with current HEAD attached updated
set of patches.

Rebased and fixed a few bugs.

I got feedbacks regarding transaciton management FDW APIs at Japan
PostgreSQL Developer Meetup[1] and am considering to change these APIs
to make them consistent with XA interface[2] (xa_prepare(),
xa_commit() and xa_rollback()) as follows[3].

* FdwXactResult PrepareForeignTransaction(FdwXactState *state, inf flags)
* FdwXactResult CommitForeignTransaction(FdwXactState *state, inf flags)
* FdwXactResult RollbackForeignTransaction(FdwXactState *state, inf flags)
* char *GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int
*prep_id_len)

Where flags set variaous setttings, currently it would contain only
FDW_XACT_FLAG_ONEPHASE that requires FDW to commit in one-phase (i.e.
without preparation). And where *state would contains information
necessary for specifying transaction: serverid, userid, usermappingid
and prepared id. GetPrepareId API is optional. Also I've removed the
two_phase_commit parameter from postgres_fdw options because we can
disable to use two-phase commit protocol for distributed transactions
using by distributed_atomic_commit GUC parameter.

Foreign transactions whose FDW provides both CommitForeignTransaction
API and RollbackForeignTransaction API will be managed by the global
transaction manager automatically. In addition, if the FDW also
provide PrepareForeignTransaction API it will participate to two-phase
commit protocol as a participant. So the existing FDWs that don't
provide transaction management FDW APIs can continue to work as before
even though this patch get committed.

The one point I'm concerned about this API design would be that since
both CommitForeignTransaction API and RollbackForeignTransaction API
will be used by two different kinds of process (backend and
transaction resolver processes), it might be hard to understand them
correctly for FDW developers.

I'd like to define new APIs so that FDW developers don't get confused.
Feedback is very welcome.

[1] https://wiki.postgresql.org/wiki/Japan_PostgreSQL_Developer_Meetup
[2] https://en.wikipedia.org/wiki/X/Open_XA
[3] The current API design I'm proposing has 6 APIs: Prepare, Commit,
Rollback, Resolve, IsTwoPhaseEnabled and GetPrepareId. And these APIs
are devided based on who executes it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#18Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Ildar Musin (#17)

On Tue, Jan 29, 2019 at 5:47 PM Ildar Musin <ildar@adjust.com> wrote:

Hello,

The patch needs rebase as it doesn't apply to the current master. I applied it
to the older commit to test it. It worked fine so far.

Thank you for testing the patch!

I found one bug though which would cause resolver to finish by timeout even
though there are unresolved foreign transactions in the list. The
`fdw_xact_exists()` function expects database id as the first argument and xid
as the second. But everywhere it is called arguments specified in the different
order (xid first, then dbid). Also function declaration in header doesn't
match its definition.

Will fix.

There are some other things I found.
* In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is
declared as bool but used as integer.
* In fdwxact.c's module comment there are `FdwXactRegisterForeignTransaction()`
and `FdwXactMarkForeignTransactionModified()` functions mentioned that are
not there anymore.
* In documentation (storage.sgml) there is no mention of `pg_fdw_xact`
directory.

Couple of stylistic notes.
* In `FdwXactCtlData struct` there are both camel case and snake case naming
used.
* In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with
`TransactionIdIsValid(xid)`.
* In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of format
string instead of being processed by `sprintf` as an extra argument.

I'll incorporate them at the next patch set.

I'll continue looking into the patch. Thanks!

Thanks. Actually I'm updating the patch set, changing API interface as
I proposed before and improving the document and README. I'll submit
the latest patch next week.

--
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#19Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#18)

On Thu, Jan 31, 2019 at 11:09:09AM +0100, Masahiko Sawada wrote:

Thanks. Actually I'm updating the patch set, changing API interface as
I proposed before and improving the document and README. I'll submit
the latest patch next week.

Cool, I have moved the patch to next CF.
--
Michael

#20Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#18)
4 attachment(s)

On Thu, Jan 31, 2019 at 7:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 29, 2019 at 5:47 PM Ildar Musin <ildar@adjust.com> wrote:

Hello,

The patch needs rebase as it doesn't apply to the current master. I applied it
to the older commit to test it. It worked fine so far.

Thank you for testing the patch!

I found one bug though which would cause resolver to finish by timeout even
though there are unresolved foreign transactions in the list. The
`fdw_xact_exists()` function expects database id as the first argument and xid
as the second. But everywhere it is called arguments specified in the different
order (xid first, then dbid). Also function declaration in header doesn't
match its definition.

Will fix.

There are some other things I found.
* In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is
declared as bool but used as integer.
* In fdwxact.c's module comment there are `FdwXactRegisterForeignTransaction()`
and `FdwXactMarkForeignTransactionModified()` functions mentioned that are
not there anymore.
* In documentation (storage.sgml) there is no mention of `pg_fdw_xact`
directory.

Couple of stylistic notes.
* In `FdwXactCtlData struct` there are both camel case and snake case naming
used.
* In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with
`TransactionIdIsValid(xid)`.
* In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of format
string instead of being processed by `sprintf` as an extra argument.

I'll incorporate them at the next patch set.

I'll continue looking into the patch. Thanks!

Thanks. Actually I'm updating the patch set, changing API interface as
I proposed before and improving the document and README. I'll submit
the latest patch next week.

Sorry for the very late. Attached updated version patches.

The basic mechanism has not been changed since the previous version.
But the updated version patch uses the single wait queue instead of
two queues (active and retry) which were used in the previous version.

Every backends processes has a timestamp in PGPROC
(fdwXactNextResolutionTs), that is the time when they expect to be
processed by foreign resolver process at. Entries in the wait queue is
ordered by theirs timestamps. The wait queue and timestamp are used
after a backend process prepared all transactions on foreign servers
and wait for all of them to be resolved.

Backend processes who are committing/aborting the distributed
transaction insert itself to the wait queue
(FdwXactRslvCtl->fdwxact_queue) with the current timestamp, and then
request to launch a new resolver process if not launched yet. If there
is resolver connecting to the same database they just set its latch.
The wait queue is protected by LWLock FdwXactResolutionLock. Then the
backend sleep until either user requests to cancel (press ctrl-c) or
waken up by resolver process.

Foreign resolver process continue to poll the wait queue, checking if
there is any waiter on the database that the resolver process connects
to. If there is a waiter, fetches it and check its timestamp. If the
current timestamp goes over its timestamp, the resolver process start
to resolve all foreign transactions. Usually backends processes insert
itself to wait queue first then wake up the resolver and they use the
same wall-clock, so the resolver can fetch the waiter just inserted.
Once all foreign transactions are resolved, the resolver process
delete the backend entry from the wait queue, and then wake up the
waiting backend.

On failure during foreign transaction resolution, while the backend is
still sleeping, the resolver process removes and inserts the backend
with the new timestamp (its timestamp
foreign_transaction_resolution_interval) to appropriate position in
the wait queue. This mechanism ensures that a distributed transaction
is resolved as soon as the waiter inserted while ensuring that the
resolver can retry to resolve the failed foreign transactions at a
interval of foreign_transaction_resolution_interval time.

For handling in-doubt transactions, I've removed the automatically
foreign transaction resolution code from the first version patch since
it's not essential feature and we can add it later. Therefore user
needs to resolve unresolved foreign transactions manually using by
pg_resolve_fdwxacts() function in three cases: where the foreign
server crashed or we lost connectibility to it during preparing
foreign transaction, where the coordinator node crashed during
preparing/resolving the foreign transaction and where user canceled to
resolve the foreign transaction.

For foreign transaction resolver processes, they exit if they don't
have any foreign transaction to resolve longer than
foreign_transaction_resolver_timeout. Since we cannot drop a database
while a resolver process is connecting to we can stop it call by
pg_stop_fdwxact_resolver() function.

The comment at top of fdwxact.c file describes about locking mechanism
and recovery, and src/backend/fdwxact/README descries about status
transition of FdwXact.

Also the wiki page[1]https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions describes how to use this feature with some examples.

[1]: https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v23-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v23-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From cdfdf20e785fbf44c1b12277862e41a43efbde14 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 8 Feb 2019 10:44:54 +0900
Subject: [PATCH v23 1/4] Keep track of writing on non-temporary relation

---
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8c0a2c4..e0c4e0a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -582,6 +582,10 @@ ExecInsert(ModifyTableState *mtstate,
 						 estate->es_output_cid,
 						 0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -932,6 +936,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1441,6 +1449,10 @@ lreplace:;
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	if (canSetTag)
 		(estate->es_processed)++;
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index d787f92..d6803d6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -103,6 +103,12 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
 /*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
-- 
2.10.5

v23-0004-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v23-0004-Add-regression-tests-for-atomic-commit.patchDownload
From 31b5cd4f4e7d9e7ff84721fdd0aab183de1189bb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v23 4/4] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 648dd7e..b17429f 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..9af9bb8
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 189abbb..0ab9f17 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2307,9 +2307,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2324,7 +2327,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v23-0003-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v23-0003-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 28979f2a4a6887ef687d3d95b615f62a4a42a2b4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v23 3/4] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                  |   7 +-
 contrib/postgres_fdw/connection.c              | 608 ++++++++++++++++---------
 contrib/postgres_fdw/expected/postgres_fdw.out | 261 ++++++++++-
 contrib/postgres_fdw/fdwxact.conf              |   3 +
 contrib/postgres_fdw/postgres_fdw.c            |  21 +-
 contrib/postgres_fdw/postgres_fdw.h            |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 119 +++++
 doc/src/sgml/postgres-fdw.sgml                 |  46 ++
 8 files changed, 833 insertions(+), 239 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 85394b4..5198f40 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -10,7 +10,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -23,3 +23,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 239d220..40ddaa1 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2019, PostgreSQL Global Development Group
  *
@@ -14,9 +14,12 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
-#include "catalog/pg_user_mapping.h"
 #include "access/xact.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -56,6 +59,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +73,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 					   SubTransactionId mySubid,
 					   SubTransactionId parentSubid,
@@ -89,26 +89,28 @@ static void pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry);
 static bool pgfdw_cancel_query(PGconn *conn);
 static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 						 bool ignore_errors);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 						 PGresult **result);
-
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +130,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +137,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +150,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +192,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +201,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +212,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +228,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -414,7 +463,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -645,193 +694,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 }
 
 /*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow remote transactions that modified anything,
-					 * since it's not very reasonable to hold them open until
-					 * the prepared transaction is committed.  For the moment,
-					 * throw error unconditionally; later we might allow
-					 * read-only cases.  Note that the error will cause us to
-					 * come right back here with event == XACT_EVENT_ABORT, so
-					 * we'll clean up the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot prepare a transaction that modified remote tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
-/*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
 static void
@@ -847,10 +709,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -861,6 +719,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1195,3 +1057,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 3ed5270..4c787a0 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -179,15 +198,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8591,3 +8612,225 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     0
+(1 row)
+
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000..3fdbf93
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index db62caf..648aeb8 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -499,7 +500,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 				  const PgFdwRelationInfo *fpinfo_o,
 				  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -553,6 +553,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1441,7 +1446,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2323,7 +2328,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2624,7 +2629,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3425,7 +3430,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4298,7 +4303,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4388,7 +4393,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4611,7 +4616,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index b382945..4b945d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/pathnodes.h"
@@ -121,7 +122,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -129,6 +130,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 				   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int ExtractConnectionOptions(List *defelems,
@@ -192,6 +196,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 						bool is_subquery,
 						List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3bfcdab..f7954ce 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2419,3 +2442,99 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index a46fd75..5f8fe27 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -436,6 +436,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -464,6 +501,14 @@
   </para>
 
   <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
+  <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
    isolation level; otherwise it uses <literal>REPEATABLE READ</literal>
@@ -478,6 +523,7 @@
    COMMITTED</literal> local transaction.  A future
    <productname>PostgreSQL</productname> release might modify these rules.
   </para>
+
  </sect2>
 
  <sect2>
-- 
2.10.5

v23-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v23-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From de1a1c3530c6f79be1ded2cbc773517afecbd5dd Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:44:42 +0900
Subject: [PATCH v23 2/4] Support atomic commit among multiple foreign servers.

---
 doc/src/sgml/catalogs.sgml                    |  145 ++
 doc/src/sgml/config.sgml                      |  146 +-
 doc/src/sgml/distributed-transaction.sgml     |  158 ++
 doc/src/sgml/fdwhandler.sgml                  |  236 ++
 doc/src/sgml/filelist.sgml                    |    1 +
 doc/src/sgml/func.sgml                        |   89 +
 doc/src/sgml/monitoring.sgml                  |   60 +
 doc/src/sgml/postgres.sgml                    |    1 +
 doc/src/sgml/storage.sgml                     |    6 +
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 ++
 src/backend/access/fdwxact/fdwxact.c          | 2833 +++++++++++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  644 ++++++
 src/backend/access/fdwxact/resolver.c         |  344 +++
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    7 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    9 +
 src/backend/executor/nodeForeignscan.c        |   25 +
 src/backend/executor/nodeModifyTable.c        |   18 +
 src/backend/foreign/foreign.c                 |   57 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  165 ++
 src/include/access/fdwxact_launcher.h         |   29 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   66 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   13 +-
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 64 files changed, 5782 insertions(+), 27 deletions(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 1701863..4d6aa16 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8201,6 +8201,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
      </row>
 
      <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
+     <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
      </row>
@@ -9640,6 +9645,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ade44b5..867814f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4281,7 +4281,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8609,6 +8608,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..350b1af
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 2c07a86..118e2a7 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1396,6 +1396,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1875,4 +1996,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 7e37042..c4300a1 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6211ff3..7d39e7f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21895,6 +21895,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 547fe4c..9c71ea6 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -368,6 +368,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1236,6 +1244,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
         </row>
@@ -1459,6 +1479,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2338,6 +2362,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 3e115f1..5ae3807 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index e0915b6..89da268 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -84,6 +84,12 @@ Item
 </row>
 
 <row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
+<row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
 </row>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a..49480dd 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..0207a66
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000..a6a46ad
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itselft to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rolbacked the resolver process wake up the waiter.
+
+
+API Contract With Transaction Management Callback Functions
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForiegnTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the trasaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transactio in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (i.g. preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
\ No newline at end of file
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..6a63663
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2833 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaciton API and
+ * RollbackForeignTransactionAPI. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similary if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by pg_resovle_fdwxact()
+ * function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to dink.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry
+	 * is not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char			*fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool			modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact_fn;
+	CommitForeignTransaction_function	commit_foreign_xact_fn;
+	RollbackForeignTransaction_function	rollback_foreign_xact_fn;
+	GetPrepareId_function				get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid,	void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(void);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation		rel;
+	Oid				serverid;
+	Oid				userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	MemoryContext		old_ctx;
+	FdwRoutine			*routine;
+	ListCell	   		*lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks.
+	 */
+	if (!routine->CommitForeignTransaction ||
+		!routine->RollbackForeignTransaction)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we touched the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * If foreign twophase commit is required, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ *  prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	ListCell	*lc = NULL;
+	ListCell	*next = NULL;
+	ListCell	*prev = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * we require all modified server have to be capable of two-phase
+	 * commit protocol.
+	 */
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false
+	 * if foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required();
+
+	/*
+	 * Firstly, we consider to commit foreign transactions in one-phase.
+	 */
+	for (lc = list_head(FdwXactParticipants); lc != NULL; lc = next)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool	commit = false;
+
+		next = lnext(lc);
+
+		/* Can commit in one-phase if two-phase commit is not requried */
+		if (!need_twophase_commit)
+			commit = true;
+
+		/* Non-modified foreign transaction always can be committed in one-phase */
+		if (!fdw_part->modified)
+			commit = true;
+
+		/*
+		 * In 'prefer' case, non-twophase-commit capable server can be
+		 * committed in one-phase.
+		 */
+		if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+			!IsSeverCapableOfTwophaseCommit(fdw_part))
+			commit = true;
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = list_delete_cell(FdwXactParticipants,
+												   lc, prev);
+			continue;
+		}
+
+		prev = lc;
+	}
+
+	/* All done if we committed all foreign transactions */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Secondary, if only one transaction is remained in the participant list
+	 * and we didn't modified the local data we can commit it without
+	 * preparation.
+	 */
+	if (list_length(FdwXactParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		/* All foreign transaction must be committed */
+		list_free(FdwXactParticipants);
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign trasactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell		*lcell;
+	TransactionId	xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXactRslvState 	*state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the FDWXACT_STATUS_PREPARING
+		 * status. Registration persists this information to the disk and logs
+		 * (that way relaying it on standby). Thus in case we loose connectivity
+		 * to the foreign server or crash ourselves, we will remember that we
+		 * might have prepared transaction on the foreign server and try to
+		 * resolve it when connectivity is restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-preapred
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fdwxact;
+	FdwXactOnDiskData	*fdwxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdwxact_id)
+{
+	int i;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is beging resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if the current transaction
+ * modified data on two or more servers in FdwXactParticipants and
+ * local server itself.
+ */
+static bool
+is_foreign_twophase_commit_required(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	ForeignTwophaseCommitIsRequired = true;
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * Also, exit if the transaction itself has no foreign transaction
+	 * participants.
+	 */
+	if (FdwXactParticipants == NIL && wait_xid == MyPgXact->xid)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		foreach (lcell, FdwXactParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+										  PGPROC *waiter)
+{
+	FdwXact	fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping	*usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	bool			is_commit = false;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 *  and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is
+	 * about to be committed or aborted. This should not happen except for one
+	 * case where the local transaction is prepared and this foreign transaction
+	 * is being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool				is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-dbout transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdwxact_exists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	return fdwxact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List		*all_fdwxacts;
+	ListCell	*lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respecitively.
+ */
+static List*
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int i;
+	List	*fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact	fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	*id;
+	int		id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * idenetifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;						/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdwxacts,
+							 serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid =
+		XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid  != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							  fdwxact_data->serverid, fdwxact_data->userid,
+							  fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know
+	 * the xact status right now. Resolver will set it later based on
+	 * the status of local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;	/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	FdwXactRslvState	*state;
+	FdwXactStatus		prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000..45fb530
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,644 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always starts when the backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find FdwXact
+		 * entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int unused_slot;
+	int i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (unused_slot > max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	*resolver_dbs;	/* DBs resolver's running on */
+	HTAB	*fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL	ctl;
+	HASH_SEQ_STATUS status;
+	Oid		*entry;
+	bool	launched;
+	int		i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign trasaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		 /* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000..9298877
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,344 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+bool foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC			*waiter = NULL;
+		TransactionId	waitXid = InvalidTransactionId;
+		TimestampTz		resolutionTs = -1;
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if	(resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransactionAndReleaseWaiter(MyDatabaseId, waitXid,
+													  waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long	sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..fe0cef9
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3..1d4e1c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index f9a4960..c625b45 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -852,6 +853,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2332,6 +2362,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2391,6 +2427,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8522a2f..0b35cdb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1217,6 +1218,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1225,6 +1227,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1263,12 +1266,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1426,6 +1430,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2085,6 +2097,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2244,6 +2259,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2331,6 +2347,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2525,6 +2543,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2736,6 +2755,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c00b63c..61d4f3d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5250,6 +5251,7 @@ BootStrapXLOG(void)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6168,6 +6170,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6690,14 +6695,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6889,7 +6895,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7385,6 +7394,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7703,6 +7713,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -8978,6 +8991,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9411,8 +9425,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9430,6 +9446,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9446,6 +9463,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9651,6 +9669,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9850,6 +9869,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 161bad6..b285fb0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -291,6 +291,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -774,6 +777,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c39218f..d77d35a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2855,9 +2855,16 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	}
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index 413ce3f..5588681 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1105,6 +1107,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1422,6 +1436,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
@@ -1574,6 +1597,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 70709e5..b8b76d9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,7 +13,9 @@
  */
 #include "postgres.h"
 
+
 #include "access/table.h"
+#include "access/fdwxact.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
@@ -944,7 +946,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1da..eb7450c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,10 +226,33 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
+	}
+
 	return scanstate;
 }
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e0c4e0a..e9773a3 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -38,6 +38,7 @@
 #include "postgres.h"
 
 #include "access/heapam.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -544,6 +546,10 @@ ExecInsert(ModifyTableState *mtstate,
 									 NULL,
 									 specToken);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -772,6 +778,10 @@ ldelete:;
 							  &tmfd,
 							  changingPart);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -1317,6 +1327,10 @@ lreplace:;
 							  true /* wait for commit */ ,
 							  &tmfd, &lockmode, &update_indexes);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -2375,6 +2389,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index c917ec4..2780ed5 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,20 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction &&
+		 !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction &&
+		 routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data-wrapper must support both commit and rollback routine or either");
+
+	if (routine->PrepareForeignTransaction &&
+		(!routine->CommitForeignTransaction ||
+		 !routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f5db5a8..10393d3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index cdf87ba..66cc834 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3574,6 +3574,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3777,6 +3783,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3992,6 +4003,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 60d29a2..4d370f6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -900,6 +902,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -975,12 +981,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eec3a22..dc02e42 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d7335..1491bc6 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(int port)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 010cc06..67ffcdb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -91,6 +91,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -246,6 +248,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1313,6 +1316,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1378,6 +1382,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1428,6 +1433,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3016,6 +3030,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843..0b8a487 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock					45
+FdwXactResolverLock			46
+FdwXactResolutionLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 0da5b19..d57daf1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -422,6 +423,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -823,6 +828,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1..8b8330b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3000,6 +3002,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f7f726b..ace89d0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -397,6 +398,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
+/*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
  */
@@ -720,6 +740,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2352,6 +2378,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4374,6 +4446,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
+	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 77bb7c2..ae212c7 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -341,6 +343,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 33ac627..328b857 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 09b59c8..99d4d39 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -210,6 +210,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index a674f52..3f666cc 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -310,6 +310,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 82a8ec9..4a3e524 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -717,6 +717,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -933,6 +934,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000..147d41c
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,165 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											   without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER,		/* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										   twophase commit */
+} ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID,
+	FDWXACT_STATUS_INITIAL,
+	FDWXACT_STATUS_PREPARING,		/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,		/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,		/* foreign prepared transaction is to
+									 * be committed */
+	FDWXACT_STATUS_ABORTING,		/* foreign prepared transaction is to be
+									 * aborted */
+	FDWXACT_STATUS_RESOLVED
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact			fdwxact_free_next;	/* Next free FdwXact entry */
+
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes
+									 * place */
+	Oid				userid;			/* user who initiated the foreign
+									 * transaction */
+	Oid				umid;
+	bool			indoubt;		/* Is an in-doubt transaction? */
+	slock_t			mutex;			/* Protect the above fields */
+
+	/* The status of the foreign transaction, protected by FdwXactLock */
+	FdwXactStatus 	status;
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	char	*fdwxact_id;
+
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	int		flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdwxact_exists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+													  PGPROC *waiter);
+extern bool FdwXactResolveInDoubtTransactions(Oid dbid);
+extern PGPROC *FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p);
+extern void FdwXactCleanupAtProcExit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+extern bool FdwXactWaiterExists(Oid dbid);
+
+#endif   /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000..dd0f5d1
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLauncherRequestToLaunchForRetry(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif	/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000..2607654
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000..39ca66b
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000..55fc970
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 3c0db2c..5798b4c 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index fcd1913..34dfa81 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 				TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index d6803d6..4e88b74 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -109,6 +109,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 8b1348c..f29b1b1 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -228,6 +228,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index ff98d9e..773846d 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73ebfdf..01fc53b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5153,6 +5153,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5866,6 +5873,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '6052', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -5984,6 +6009,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '6054',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index fd91eb6..a69b700 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -248,7 +260,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 64919c9..b82cb45 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -80,6 +80,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 						 bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 							  bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fa5dca3..9aa84cb 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -775,6 +775,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -852,7 +854,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -932,6 +936,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1cee7db..f9c4eb7 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -21,6 +21,7 @@
 #include "storage/lock.h"
 #include "storage/pg_sema.h"
 #include "storage/proclist_types.h"
+#include "datatype/timestamp.h"
 
 /*
  * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
@@ -153,6 +154,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index bd24850..d569fc0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 								TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 2a74b30..c530634 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3097390..492b254 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1341,6 +1341,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1841,6 +1849,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.10.5

#21Thomas Munro
thomas.munro@gmail.com
In reply to: Masahiko Sawada (#20)

On Wed, Apr 17, 2019 at 10:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry for the very late. Attached updated version patches.

Hello Sawada-san,

Can we please have a fresh rebase?

Thanks,

--
Thomas Munro
https://enterprisedb.com

#22Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Thomas Munro (#21)
5 attachment(s)

On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Apr 17, 2019 at 10:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry for the very late. Attached updated version patches.

Hello Sawada-san,

Can we please have a fresh rebase?

Thank you for the notice. Attached rebased patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v24-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v24-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 28bb614ac2d7a82a080e178d7f37de47142b299d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 8 Feb 2019 10:44:54 +0900
Subject: [PATCH v24 1/5] Keep track of writing on non-temporary relation

---
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d8b695d..a70549d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -588,6 +588,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -940,6 +944,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1451,6 +1459,10 @@ lreplace:;
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	if (canSetTag)
 		(estate->es_processed)++;
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a20726a..c9d6b47 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -103,6 +103,12 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
 /*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
-- 
2.10.5

v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 283f0a009ca26a12f0c6738ad4d6325d895f3b55 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 2 Jul 2019 09:32:16 +0900
Subject: [PATCH v24 2/5] Support atomic commit among multiple foreign servers.

---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 ++
 src/backend/access/fdwxact/fdwxact.c          | 2833 +++++++++++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  644 ++++++
 src/backend/access/fdwxact/resolver.c         |  344 +++
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    7 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    9 +
 src/backend/executor/nodeForeignscan.c        |   25 +
 src/backend/executor/nodeModifyTable.c        |   18 +
 src/backend/foreign/foreign.c                 |   57 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   13 +-
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 49 files changed, 4546 insertions(+), 26 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a..49480dd 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..0207a66
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000..a6a46ad
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itselft to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rolbacked the resolver process wake up the waiter.
+
+
+API Contract With Transaction Management Callback Functions
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForiegnTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the trasaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transactio in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (i.g. preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
\ No newline at end of file
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..6a63663
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2833 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaciton API and
+ * RollbackForeignTransactionAPI. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similary if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by pg_resovle_fdwxact()
+ * function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to dink.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry
+	 * is not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char			*fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool			modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact_fn;
+	CommitForeignTransaction_function	commit_foreign_xact_fn;
+	RollbackForeignTransaction_function	rollback_foreign_xact_fn;
+	GetPrepareId_function				get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid,	void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(void);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation		rel;
+	Oid				serverid;
+	Oid				userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	MemoryContext		old_ctx;
+	FdwRoutine			*routine;
+	ListCell	   		*lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks.
+	 */
+	if (!routine->CommitForeignTransaction ||
+		!routine->RollbackForeignTransaction)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we touched the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * If foreign twophase commit is required, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ *  prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	ListCell	*lc = NULL;
+	ListCell	*next = NULL;
+	ListCell	*prev = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * we require all modified server have to be capable of two-phase
+	 * commit protocol.
+	 */
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false
+	 * if foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required();
+
+	/*
+	 * Firstly, we consider to commit foreign transactions in one-phase.
+	 */
+	for (lc = list_head(FdwXactParticipants); lc != NULL; lc = next)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool	commit = false;
+
+		next = lnext(lc);
+
+		/* Can commit in one-phase if two-phase commit is not requried */
+		if (!need_twophase_commit)
+			commit = true;
+
+		/* Non-modified foreign transaction always can be committed in one-phase */
+		if (!fdw_part->modified)
+			commit = true;
+
+		/*
+		 * In 'prefer' case, non-twophase-commit capable server can be
+		 * committed in one-phase.
+		 */
+		if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+			!IsSeverCapableOfTwophaseCommit(fdw_part))
+			commit = true;
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = list_delete_cell(FdwXactParticipants,
+												   lc, prev);
+			continue;
+		}
+
+		prev = lc;
+	}
+
+	/* All done if we committed all foreign transactions */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Secondary, if only one transaction is remained in the participant list
+	 * and we didn't modified the local data we can commit it without
+	 * preparation.
+	 */
+	if (list_length(FdwXactParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		/* All foreign transaction must be committed */
+		list_free(FdwXactParticipants);
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign trasactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell		*lcell;
+	TransactionId	xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXactRslvState 	*state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the FDWXACT_STATUS_PREPARING
+		 * status. Registration persists this information to the disk and logs
+		 * (that way relaying it on standby). Thus in case we loose connectivity
+		 * to the foreign server or crash ourselves, we will remember that we
+		 * might have prepared transaction on the foreign server and try to
+		 * resolve it when connectivity is restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-preapred
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fdwxact;
+	FdwXactOnDiskData	*fdwxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdwxact_id)
+{
+	int i;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is beging resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if the current transaction
+ * modified data on two or more servers in FdwXactParticipants and
+ * local server itself.
+ */
+static bool
+is_foreign_twophase_commit_required(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	ForeignTwophaseCommitIsRequired = true;
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * Also, exit if the transaction itself has no foreign transaction
+	 * participants.
+	 */
+	if (FdwXactParticipants == NIL && wait_xid == MyPgXact->xid)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		foreach (lcell, FdwXactParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+										  PGPROC *waiter)
+{
+	FdwXact	fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping	*usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	bool			is_commit = false;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 *  and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is
+	 * about to be committed or aborted. This should not happen except for one
+	 * case where the local transaction is prepared and this foreign transaction
+	 * is being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool				is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-dbout transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdwxact_exists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	return fdwxact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List		*all_fdwxacts;
+	ListCell	*lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respecitively.
+ */
+static List*
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int i;
+	List	*fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact	fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	*id;
+	int		id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * idenetifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;						/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdwxacts,
+							 serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid =
+		XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid  != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							  fdwxact_data->serverid, fdwxact_data->userid,
+							  fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know
+	 * the xact status right now. Resolver will set it later based on
+	 * the status of local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;	/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	FdwXactRslvState	*state;
+	FdwXactStatus		prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000..45fb530
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,644 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always starts when the backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find FdwXact
+		 * entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int unused_slot;
+	int i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (unused_slot > max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	*resolver_dbs;	/* DBs resolver's running on */
+	HTAB	*fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL	ctl;
+	HASH_SEQ_STATUS status;
+	Oid		*entry;
+	bool	launched;
+	int		i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign trasaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		 /* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000..9298877
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,344 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+bool foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC			*waiter = NULL;
+		TransactionId	waitXid = InvalidTransactionId;
+		TimestampTz		resolutionTs = -1;
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if	(resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransactionAndReleaseWaiter(MyDatabaseId, waitXid,
+													  waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long	sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3..1d4e1c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 5196d61..dfc4d1d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -852,6 +853,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2318,6 +2348,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2377,6 +2413,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d7930c0..8aaf5ae 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1216,6 +1217,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1224,6 +1226,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1262,12 +1265,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1425,6 +1429,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2084,6 +2096,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2243,6 +2258,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2330,6 +2346,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2524,6 +2542,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2729,6 +2748,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 13e0d23..9740855 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5249,6 +5250,7 @@ BootStrapXLOG(void)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6179,6 +6181,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6701,14 +6706,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6900,7 +6906,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7396,6 +7405,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7714,6 +7724,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -8989,6 +9002,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9422,8 +9436,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9441,6 +9457,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9457,6 +9474,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9662,6 +9680,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9861,6 +9880,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..342dd6a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -332,6 +332,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -815,6 +818,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f1161f0..ef0078c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2857,9 +2857,16 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	}
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index d7bc6e3..76f7b78 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1105,6 +1107,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1424,6 +1438,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
@@ -1576,6 +1599,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 6f2b4d6..16fab44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,7 +13,9 @@
  */
 #include "postgres.h"
 
+
 #include "access/table.h"
+#include "access/fdwxact.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
@@ -949,7 +951,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1da..eb7450c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,10 +226,33 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
+	}
+
 	return scanstate;
 }
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a70549d..d942ea5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -38,6 +38,7 @@
 #include "postgres.h"
 
 #include "access/heapam.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -550,6 +552,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -778,6 +784,10 @@ ldelete:;
 									&tmfd,
 									changingPart);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -1325,6 +1335,10 @@ lreplace:;
 									true /* wait for commit */ ,
 									&tmfd, &lockmode, &update_indexes);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -2387,6 +2401,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index c917ec4..2780ed5 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,20 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction &&
+		 !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction &&
+		 routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data-wrapper must support both commit and rollback routine or either");
+
+	if (routine->PrepareForeignTransaction &&
+		(!routine->CommitForeignTransaction ||
+		 !routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index b66b517..517169b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28..44b8ebd 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3652,6 +3652,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3855,6 +3861,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4070,6 +4081,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 688ad43..b840eff 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -900,6 +902,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -975,12 +981,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..325cf9d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d7335..1491bc6 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(int port)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 18a0f62..59dfc5f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -91,6 +91,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -246,6 +248,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1313,6 +1316,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1378,6 +1382,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1428,6 +1433,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and defer_cleanup_age has been applied,
 	 * check whether we need to back up further to make logical decoding
 	 * possible. We need to do so if we're computing the global limit (rel =
@@ -3016,6 +3030,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843..0b8a487 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock					45
+FdwXactResolverLock			46
+FdwXactResolutionLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 498373f..dc77509 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -422,6 +423,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -823,6 +828,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1..8b8330b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3000,6 +3002,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 631f16f..ec3a45d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -397,6 +398,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
+/*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
  */
@@ -719,6 +739,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2352,6 +2378,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4374,6 +4446,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
+	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5ee5e09..da31b2b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -341,6 +343,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 33ac627..328b857 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2ef1791..3659501 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -210,6 +210,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index d955b97..20678e4 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -310,6 +310,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2734f87..840df85 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -717,6 +717,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -933,6 +934,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 3c0db2c..5798b4c 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index b9a531c..8238723 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index c9d6b47..2e6da26 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -109,6 +109,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 3cc9c3d..1410a11 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -228,6 +228,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index ff98d9e..773846d 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8733524..abed76f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5150,6 +5150,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5863,6 +5870,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '6052', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -5981,6 +6006,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '6054',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 8226860..f6592ee 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -248,7 +260,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 46759e3..4a150e6 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -80,6 +80,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..33d34d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -777,6 +777,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -854,7 +856,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -934,6 +938,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index ac7ee72..04111f9 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -21,6 +21,7 @@
 #include "storage/lock.h"
 #include "storage/pg_sema.h"
 #include "storage/proclist_types.h"
+#include "datatype/timestamp.h"
 
 /*
  * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
@@ -153,6 +154,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index da8b672..04f9c8c 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index d68976f..d5fec50 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..c862e0e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1341,6 +1341,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1841,6 +1849,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.10.5

v24-0003-Documentation-update.patchapplication/octet-stream; name=v24-0003-Documentation-update.patchDownload
From 45bbe2360c19e415e004a6aa512d968ee26649e3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 2 Jul 2019 09:31:05 +0900
Subject: [PATCH v24 3/5] Documentation update.

---
 doc/src/sgml/catalogs.sgml                | 145 ++++++++++++++++++
 doc/src/sgml/config.sgml                  | 146 +++++++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 +++++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 09690b6..4fe4b1c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8274,6 +8274,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
      </row>
 
      <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
+     <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
      </row>
@@ -9718,6 +9723,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a3..3e988e3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4281,7 +4281,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8615,6 +8614,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..350b1af
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb..1b292e4 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1413,6 +1413,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1892,4 +2013,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365..80a87fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3a8581d..f03c4cd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21996,6 +21996,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..d587164 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -368,6 +368,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1236,6 +1244,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
         </row>
@@ -1459,6 +1479,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2338,6 +2362,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 3e115f1..5ae3807 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 1047c77..f69bc72 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -84,6 +84,12 @@ Item
 </row>
 
 <row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
+<row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
 </row>
-- 
2.10.5

v24-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v24-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 9f7cbe1f78c550bf62f435ff97845f7fde5f46cd Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v24 4/5] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                  |   7 +-
 contrib/postgres_fdw/connection.c              | 604 ++++++++++++++++---------
 contrib/postgres_fdw/expected/postgres_fdw.out | 261 ++++++++++-
 contrib/postgres_fdw/fdwxact.conf              |   3 +
 contrib/postgres_fdw/postgres_fdw.c            |  21 +-
 contrib/postgres_fdw/postgres_fdw.h            |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 119 +++++
 doc/src/sgml/postgres-fdw.sgml                 |  46 ++
 8 files changed, 830 insertions(+), 238 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 85394b4..5198f40 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -10,7 +10,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -23,3 +23,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 57ed5f4..093eda6 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2019, PostgreSQL Global Development Group
  *
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -91,24 +89,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 									 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
-
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +128,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +135,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +148,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +190,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +199,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +210,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +226,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -414,7 +461,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -645,193 +692,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 }
 
 /*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow remote transactions that modified anything,
-					 * since it's not very reasonable to hold them open until
-					 * the prepared transaction is committed.  For the moment,
-					 * throw error unconditionally; later we might allow
-					 * read-only cases.  Note that the error will cause us to
-					 * come right back here with event == XACT_EVENT_ABORT, so
-					 * we'll clean up the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot prepare a transaction that modified remote tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
-/*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
 static void
@@ -847,10 +707,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -861,6 +717,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1195,3 +1055,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f0c842a..067c98f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -179,15 +198,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8781,3 +8802,225 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     0
+(1 row)
+
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000..3fdbf93
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1759b9e..b672786 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -503,7 +504,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -557,6 +557,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1446,7 +1451,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2384,7 +2389,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2684,7 +2689,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3512,7 +3517,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4387,7 +4392,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4477,7 +4482,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4700,7 +4705,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 6acb7dc..f9162bb 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/pathnodes.h"
@@ -127,7 +128,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -135,6 +136,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -201,6 +205,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 630b803..572077c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2479,3 +2502,99 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e9ce39a..0427922 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -441,6 +441,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -469,6 +506,14 @@
   </para>
 
   <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
+  <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
    isolation level; otherwise it uses <literal>REPEATABLE READ</literal>
@@ -483,6 +528,7 @@
    COMMITTED</literal> local transaction.  A future
    <productname>PostgreSQL</productname> release might modify these rules.
   </para>
+
  </sect2>
 
  <sect2>
-- 
2.10.5

v24-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v24-0005-Add-regression-tests-for-atomic-commit.patchDownload
From e6450e9f000dfe922bd71e3c4adc9b16e1c45a6e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v24 5/5] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index e66e695..b17429f 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..9af9bb8
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 117a954..dc8150a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2319,9 +2319,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2336,7 +2339,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Masahiko Sawada (#22)

Hello Sawada-san,

On 2019-Jul-02, Masahiko Sawada wrote:

On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Can we please have a fresh rebase?

Thank you for the notice. Attached rebased patches.

... and again?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#24Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alvaro Herrera (#23)
5 attachment(s)

On Wed, Sep 4, 2019 at 7:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hello Sawada-san,

On 2019-Jul-02, Masahiko Sawada wrote:

On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Can we please have a fresh rebase?

Thank you for the notice. Attached rebased patches.

... and again?

Thank you for the notice. I've attached rebased patch set.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v25-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v25-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 13d2a332b84c38a1ccdd3705977e3a0859028bf5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 8 Feb 2019 10:44:54 +0900
Subject: [PATCH v25 1/5] Keep track of writing on non-temporary relation

---
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 01fe11a..778ff27 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -588,6 +588,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -940,6 +944,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1451,6 +1459,10 @@ lreplace:;
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	if (canSetTag)
 		(estate->es_processed)++;
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index d714551..6f4013e 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -103,6 +103,12 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
 /*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
-- 
2.10.5

v25-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v25-0005-Add-regression-tests-for-atomic-commit.patchDownload
From 9a9893d9bdb65fdc76405100dd712174b00e4e5c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v25 5/5] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index e66e695..b17429f 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..9af9bb8
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b4045ab..022ba1b 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2323,9 +2323,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2340,7 +2343,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v25-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v25-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From eba9ba70c673a6e0716c6b1566255bd3097add32 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v25 4/5] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                  |   7 +-
 contrib/postgres_fdw/connection.c              | 604 ++++++++++++++++---------
 contrib/postgres_fdw/expected/postgres_fdw.out | 261 ++++++++++-
 contrib/postgres_fdw/fdwxact.conf              |   3 +
 contrib/postgres_fdw/postgres_fdw.c            |  21 +-
 contrib/postgres_fdw/postgres_fdw.h            |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 119 +++++
 doc/src/sgml/postgres-fdw.sgml                 |  46 ++
 8 files changed, 830 insertions(+), 238 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 85394b4..5198f40 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -10,7 +10,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -23,3 +23,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 57ed5f4..093eda6 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2019, PostgreSQL Global Development Group
  *
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -91,24 +89,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 									 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
-
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +128,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +135,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +148,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +190,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +199,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +210,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +226,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -414,7 +461,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -645,193 +692,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 }
 
 /*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow remote transactions that modified anything,
-					 * since it's not very reasonable to hold them open until
-					 * the prepared transaction is committed.  For the moment,
-					 * throw error unconditionally; later we might allow
-					 * read-only cases.  Note that the error will cause us to
-					 * come right back here with event == XACT_EVENT_ABORT, so
-					 * we'll clean up the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot prepare a transaction that modified remote tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
-/*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
 static void
@@ -847,10 +707,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -861,6 +717,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1195,3 +1055,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f0c842a..067c98f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -179,15 +198,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8781,3 +8802,225 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     0
+(1 row)
+
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000..3fdbf93
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 82d8140..80de315 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -503,7 +504,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -557,6 +557,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1446,7 +1451,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2384,7 +2389,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2684,7 +2689,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3512,7 +3517,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4387,7 +4392,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4477,7 +4482,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4705,7 +4710,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 6acb7dc..f9162bb 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/pathnodes.h"
@@ -127,7 +128,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -135,6 +136,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -201,6 +205,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 630b803..572077c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2479,3 +2502,99 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e9ce39a..0427922 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -441,6 +441,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -469,6 +506,14 @@
   </para>
 
   <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
+  <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
    isolation level; otherwise it uses <literal>REPEATABLE READ</literal>
@@ -483,6 +528,7 @@
    COMMITTED</literal> local transaction.  A future
    <productname>PostgreSQL</productname> release might modify these rules.
   </para>
+
  </sect2>
 
  <sect2>
-- 
2.10.5

v25-0003-Documentation-update.patchapplication/octet-stream; name=v25-0003-Documentation-update.patchDownload
From dcbe971ca77cd536d4a4e78492ad03d36f0edb6b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 2 Jul 2019 09:31:05 +0900
Subject: [PATCH v25 3/5] Documentation update.

---
 doc/src/sgml/catalogs.sgml                | 145 ++++++++++++++++++
 doc/src/sgml/config.sgml                  | 146 +++++++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 +++++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5e71a2e..1dbc45e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8274,6 +8274,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
      </row>
 
      <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
+     <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
      </row>
@@ -9718,6 +9723,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 89284dc..c1ef5d8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4294,7 +4294,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8611,6 +8610,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..350b1af
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb..1b292e4 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1413,6 +1413,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1892,4 +2013,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365..80a87fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c878a0b..4d09fa0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22013,6 +22013,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..d587164 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -368,6 +368,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1236,6 +1244,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
         </row>
@@ -1459,6 +1479,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2338,6 +2362,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e59cba7..dee3f72 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 342a0ff..840b0f6 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -84,6 +84,12 @@ Item
 </row>
 
 <row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
+<row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
 </row>
-- 
2.10.5

v25-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v25-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 495152c6204554384f3cc92da4880377cd507213 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 2 Jul 2019 09:32:16 +0900
Subject: [PATCH v25 2/5] Support atomic commit among multiple foreign servers.

---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 ++
 src/backend/access/fdwxact/fdwxact.c          | 2827 +++++++++++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  644 ++++++
 src/backend/access/fdwxact/resolver.c         |  344 +++
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    7 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    9 +
 src/backend/executor/nodeForeignscan.c        |   25 +
 src/backend/executor/nodeModifyTable.c        |   18 +
 src/backend/foreign/foreign.c                 |   57 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   13 +-
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 49 files changed, 4540 insertions(+), 26 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a..49480dd 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..0207a66
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000..a6a46ad
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itselft to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rolbacked the resolver process wake up the waiter.
+
+
+API Contract With Transaction Management Callback Functions
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForiegnTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the trasaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transactio in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (i.g. preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
\ No newline at end of file
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..e3754e0
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2827 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaciton API and
+ * RollbackForeignTransactionAPI. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similary if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by pg_resovle_fdwxact()
+ * function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to dink.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry
+	 * is not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char			*fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool			modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact_fn;
+	CommitForeignTransaction_function	commit_foreign_xact_fn;
+	RollbackForeignTransaction_function	rollback_foreign_xact_fn;
+	GetPrepareId_function				get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid,	void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(void);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation		rel;
+	Oid				serverid;
+	Oid				userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	MemoryContext		old_ctx;
+	FdwRoutine			*routine;
+	ListCell	   		*lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks.
+	 */
+	if (!routine->CommitForeignTransaction ||
+		!routine->RollbackForeignTransaction)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we touched the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * If foreign twophase commit is required, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ *  prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	ListCell	*lc = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * we require all modified server have to be capable of two-phase
+	 * commit protocol.
+	 */
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false
+	 * if foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required();
+
+	/*
+	 * Firstly, we consider to commit foreign transactions in one-phase.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool	commit = false;
+
+		/* Can commit in one-phase if two-phase commit is not requried */
+		if (!need_twophase_commit)
+			commit = true;
+
+		/* Non-modified foreign transaction always can be committed in one-phase */
+		if (!fdw_part->modified)
+			commit = true;
+
+		/*
+		 * In 'prefer' case, non-twophase-commit capable server can be
+		 * committed in one-phase.
+		 */
+		if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+			!IsSeverCapableOfTwophaseCommit(fdw_part))
+			commit = true;
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants,
+														 lc);
+			continue;
+		}
+	}
+
+	/* All done if we committed all foreign transactions */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Secondary, if only one transaction is remained in the participant list
+	 * and we didn't modified the local data we can commit it without
+	 * preparation.
+	 */
+	if (list_length(FdwXactParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		/* All foreign transaction must be committed */
+		list_free(FdwXactParticipants);
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign trasactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell		*lcell;
+	TransactionId	xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXactRslvState 	*state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the FDWXACT_STATUS_PREPARING
+		 * status. Registration persists this information to the disk and logs
+		 * (that way relaying it on standby). Thus in case we loose connectivity
+		 * to the foreign server or crash ourselves, we will remember that we
+		 * might have prepared transaction on the foreign server and try to
+		 * resolve it when connectivity is restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-preapred
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fdwxact;
+	FdwXactOnDiskData	*fdwxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdwxact_id)
+{
+	int i;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is beging resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if the current transaction
+ * modified data on two or more servers in FdwXactParticipants and
+ * local server itself.
+ */
+static bool
+is_foreign_twophase_commit_required(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	ForeignTwophaseCommitIsRequired = true;
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * Also, exit if the transaction itself has no foreign transaction
+	 * participants.
+	 */
+	if (FdwXactParticipants == NIL && wait_xid == MyPgXact->xid)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		foreach (lcell, FdwXactParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+										  PGPROC *waiter)
+{
+	FdwXact	fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping	*usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	bool			is_commit = false;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 *  and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is
+	 * about to be committed or aborted. This should not happen except for one
+	 * case where the local transaction is prepared and this foreign transaction
+	 * is being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool				is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-dbout transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdwxact_exists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	return fdwxact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List		*all_fdwxacts;
+	ListCell	*lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respecitively.
+ */
+static List*
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int i;
+	List	*fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact	fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	*id;
+	int		id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * idenetifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;						/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdwxacts,
+							 serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid =
+		XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid  != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							  fdwxact_data->serverid, fdwxact_data->userid,
+							  fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know
+	 * the xact status right now. Resolver will set it later based on
+	 * the status of local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;	/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	FdwXactRslvState	*state;
+	FdwXactStatus		prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000..45fb530
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,644 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always starts when the backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find FdwXact
+		 * entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int unused_slot;
+	int i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (unused_slot > max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	*resolver_dbs;	/* DBs resolver's running on */
+	HTAB	*fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL	ctl;
+	HASH_SEQ_STATUS status;
+	Oid		*entry;
+	bool	launched;
+	int		i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign trasaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		 /* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000..9298877
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,344 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+bool foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC			*waiter = NULL;
+		TransactionId	waitXid = InvalidTransactionId;
+		TimestampTz		resolutionTs = -1;
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if	(resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransactionAndReleaseWaiter(MyDatabaseId, waitXid,
+													  waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long	sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3..1d4e1c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 477709b..82a0cb3 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -852,6 +853,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2318,6 +2348,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2377,6 +2413,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index f594d33..0d5f9d2 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1216,6 +1217,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1224,6 +1226,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1262,12 +1265,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1425,6 +1429,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2084,6 +2096,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2243,6 +2258,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2330,6 +2346,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2524,6 +2542,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2729,6 +2748,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e651a84..3716fbf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5249,6 +5250,7 @@ BootStrapXLOG(void)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6182,6 +6184,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6704,14 +6709,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6903,7 +6909,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7399,6 +7408,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7717,6 +7727,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -8992,6 +9005,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9425,8 +9439,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9444,6 +9460,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9460,6 +9477,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9665,6 +9683,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9864,6 +9883,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..342dd6a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -332,6 +332,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -815,6 +818,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3aeef30..43bb9ae 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2860,9 +2860,16 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	}
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index f96c278..621d70d 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1102,6 +1104,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1421,6 +1435,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
@@ -1573,6 +1596,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d23f292..7cdf8e1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,7 +13,9 @@
  */
 #include "postgres.h"
 
+
 #include "access/table.h"
+#include "access/fdwxact.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
@@ -944,7 +946,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1da..eb7450c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,10 +226,33 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
+	}
+
 	return scanstate;
 }
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 778ff27..ed4af1a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -38,6 +38,7 @@
 #include "postgres.h"
 
 #include "access/heapam.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -550,6 +552,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -778,6 +784,10 @@ ldelete:;
 									&tmfd,
 									changingPart);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -1325,6 +1335,10 @@ lreplace:;
 									true /* wait for commit */ ,
 									&tmfd, &lockmode, &update_indexes);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -2386,6 +2400,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index c917ec4..2780ed5 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,20 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction &&
+		 !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction &&
+		 routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data-wrapper must support both commit and rollback routine or either");
+
+	if (routine->PrepareForeignTransaction &&
+		(!routine->CommitForeignTransaction ||
+		 !routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index b66b517..517169b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f..51c3789 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3652,6 +3652,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3855,6 +3861,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4070,6 +4081,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 62dc93d..08eb99f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -900,6 +902,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -975,12 +981,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5315d93..c932167 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d7335..1491bc6 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(int port)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 8abcfdf..b1561b2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -91,6 +91,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -246,6 +248,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1310,6 +1313,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1375,6 +1379,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1425,6 +1430,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
 	 * decoding possible. We need to do so if we're computing the global limit
@@ -3014,6 +3028,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843..0b8a487 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock					45
+FdwXactResolverLock			46
+FdwXactResolutionLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 498373f..dc77509 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -422,6 +423,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -823,6 +828,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e8d8e6f..10ee130 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3007,6 +3009,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 90ffd89..7dfafe3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -397,6 +398,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
+/*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
  */
@@ -718,6 +738,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2351,6 +2377,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4361,6 +4433,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
+	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0fc23e3..6a3887d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -341,6 +343,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 33ac627..328b857 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d..6132c72 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -210,6 +210,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index ff17804..58e9630 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 159a30b..43144fa 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -711,6 +711,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -918,6 +919,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 3c0db2c..5798b4c 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index b9a531c..8238723 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 6f4013e..0cee715 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -109,6 +109,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 3f0de66..5c50677 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -228,6 +228,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index ff98d9e..773846d 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cf1f409..e03b5ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5178,6 +5178,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5891,6 +5898,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '6052', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6009,6 +6034,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '6054',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 8226860..f6592ee 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -248,7 +260,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 4de157c..91c2276 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d8..d82d8f7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -776,6 +776,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,7 +855,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -933,6 +937,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db..2eab5a9 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -21,6 +21,7 @@
 #include "storage/lock.h"
 #include "storage/pg_sema.h"
 #include "storage/proclist_types.h"
+#include "datatype/timestamp.h"
 
 /*
  * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
@@ -153,6 +154,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index da8b672..04f9c8c 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index d68976f..d5fec50 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..c862e0e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1341,6 +1341,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1841,6 +1849,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.10.5

#25Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#24)
5 attachment(s)

On Wed, Sep 4, 2019 at 10:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Sep 4, 2019 at 7:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hello Sawada-san,

On 2019-Jul-02, Masahiko Sawada wrote:

On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Can we please have a fresh rebase?

Thank you for the notice. Attached rebased patches.

... and again?

Thank you for the notice. I've attached rebased patch set.

I forgot to include some new header files. Attached the updated patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v25-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v25-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 143280d73da137a99a4b71105f8a2a1221c850df Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 8 Feb 2019 10:44:54 +0900
Subject: [PATCH v25 1/5] Keep track of writing on non-temporary relation

---
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 01fe11a..778ff27 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -588,6 +588,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -940,6 +944,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1451,6 +1459,10 @@ lreplace:;
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	if (canSetTag)
 		(estate->es_processed)++;
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index d714551..6f4013e 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -103,6 +103,12 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
 /*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
-- 
2.10.5

v25-0003-Documentation-update.patchapplication/octet-stream; name=v25-0003-Documentation-update.patchDownload
From 3d157d5716121456fc54b6fea8040f6f57d9691b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 2 Jul 2019 09:31:05 +0900
Subject: [PATCH v25 3/5] Documentation update.

---
 doc/src/sgml/catalogs.sgml                | 145 ++++++++++++++++++
 doc/src/sgml/config.sgml                  | 146 +++++++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 +++++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5e71a2e..1dbc45e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8274,6 +8274,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
      </row>
 
      <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
+     <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
      </row>
@@ -9718,6 +9723,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 89284dc..c1ef5d8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4294,7 +4294,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8611,6 +8610,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000..350b1af
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb..1b292e4 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1413,6 +1413,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1892,4 +2013,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365..80a87fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c878a0b..4d09fa0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22013,6 +22013,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..d587164 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -368,6 +368,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1236,6 +1244,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
+        <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
         </row>
@@ -1459,6 +1479,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
         <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
+        <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
          <entry>Waiting during base backup when throttling activity.</entry>
@@ -2338,6 +2362,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e59cba7..dee3f72 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 342a0ff..840b0f6 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -84,6 +84,12 @@ Item
 </row>
 
 <row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
+<row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
 </row>
-- 
2.10.5

v25-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v25-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From c9343085d91575cb24bd52cf601ead511f102f05 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:46:01 +0900
Subject: [PATCH v25 4/5] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                  |   7 +-
 contrib/postgres_fdw/connection.c              | 604 ++++++++++++++++---------
 contrib/postgres_fdw/expected/postgres_fdw.out | 261 ++++++++++-
 contrib/postgres_fdw/fdwxact.conf              |   3 +
 contrib/postgres_fdw/postgres_fdw.c            |  21 +-
 contrib/postgres_fdw/postgres_fdw.h            |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      | 119 +++++
 doc/src/sgml/postgres-fdw.sgml                 |  46 ++
 8 files changed, 830 insertions(+), 238 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 85394b4..5198f40 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -10,7 +10,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -23,3 +23,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 57ed5f4..093eda6 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2019, PostgreSQL Global Development Group
  *
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -91,24 +89,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 									 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
-
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +128,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +135,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +148,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +190,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +199,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +210,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +226,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -414,7 +461,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -645,193 +692,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 }
 
 /*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow remote transactions that modified anything,
-					 * since it's not very reasonable to hold them open until
-					 * the prepared transaction is committed.  For the moment,
-					 * throw error unconditionally; later we might allow
-					 * read-only cases.  Note that the error will cause us to
-					 * come right back here with event == XACT_EVENT_ABORT, so
-					 * we'll clean up the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot prepare a transaction that modified remote tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
-/*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
 static void
@@ -847,10 +707,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -861,6 +717,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1195,3 +1055,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f0c842a..067c98f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -179,15 +198,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8781,3 +8802,225 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     0
+(1 row)
+
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000..3fdbf93
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 82d8140..80de315 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include "postgres_fdw.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -503,7 +504,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -557,6 +557,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1446,7 +1451,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2384,7 +2389,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2684,7 +2689,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3512,7 +3517,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4387,7 +4392,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4477,7 +4482,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4705,7 +4710,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 6acb7dc..f9162bb 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/pathnodes.h"
@@ -127,7 +128,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -135,6 +136,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -201,6 +205,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 630b803..572077c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2479,3 +2502,99 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e9ce39a..0427922 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -441,6 +441,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -469,6 +506,14 @@
   </para>
 
   <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
+  <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
    isolation level; otherwise it uses <literal>REPEATABLE READ</literal>
@@ -483,6 +528,7 @@
    COMMITTED</literal> local transaction.  A future
    <productname>PostgreSQL</productname> release might modify these rules.
   </para>
+
  </sect2>
 
  <sect2>
-- 
2.10.5

v25-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v25-0005-Add-regression-tests-for-atomic-commit.patchDownload
From e335c2b54eb21376f23929cf6e0b378b9275dd96 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 10 Jun 2018 18:48:08 +0900
Subject: [PATCH v25 5/5] Add regression tests for atomic commit.

---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index e66e695..b17429f 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000..9af9bb8
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b4045ab..022ba1b 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2323,9 +2323,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2340,7 +2343,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.10.5

v25-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v25-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From a7b6737e0d29eae3954caa0ba037244abefb16b0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 2 Jul 2019 09:32:16 +0900
Subject: [PATCH v25 2/5] Support atomic commit among multiple foreign servers.

---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 ++
 src/backend/access/fdwxact/fdwxact.c          | 2827 +++++++++++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  644 ++++++
 src/backend/access/fdwxact/resolver.c         |  344 +++
 src/backend/access/rmgrdesc/Makefile          |    8 +-
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    7 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    9 +
 src/backend/executor/nodeForeignscan.c        |   25 +
 src/backend/executor/nodeModifyTable.c        |   18 +
 src/backend/foreign/foreign.c                 |   57 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  165 ++
 src/include/access/fdwxact_launcher.h         |   29 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   66 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   13 +-
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 55 files changed, 4935 insertions(+), 26 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100755 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a..49480dd 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000..0207a66
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000..a6a46ad
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itselft to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rolbacked the resolver process wake up the waiter.
+
+
+API Contract With Transaction Management Callback Functions
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForiegnTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the trasaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transactio in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (i.g. preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
\ No newline at end of file
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100755
index 0000000..e3754e0
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2827 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaciton API and
+ * RollbackForeignTransactionAPI. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similary if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by pg_resovle_fdwxact()
+ * function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to dink.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry
+	 * is not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char			*fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool			modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact_fn;
+	CommitForeignTransaction_function	commit_foreign_xact_fn;
+	RollbackForeignTransaction_function	rollback_foreign_xact_fn;
+	GetPrepareId_function				get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid,	void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(void);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation		rel;
+	Oid				serverid;
+	Oid				userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	ForeignDataWrapper	*fdw;
+	UserMapping			*user_mapping;
+	MemoryContext		old_ctx;
+	FdwRoutine			*routine;
+	ListCell	   		*lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks.
+	 */
+	if (!routine->CommitForeignTransaction ||
+		!routine->RollbackForeignTransaction)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we touched the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	foreign_server = GetForeignServer(serverid);
+	fdw = GetForeignDataWrapper(foreign_server->fdwid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * If foreign twophase commit is required, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ *  prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	ListCell	*lc = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * we require all modified server have to be capable of two-phase
+	 * commit protocol.
+	 */
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false
+	 * if foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required();
+
+	/*
+	 * Firstly, we consider to commit foreign transactions in one-phase.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool	commit = false;
+
+		/* Can commit in one-phase if two-phase commit is not requried */
+		if (!need_twophase_commit)
+			commit = true;
+
+		/* Non-modified foreign transaction always can be committed in one-phase */
+		if (!fdw_part->modified)
+			commit = true;
+
+		/*
+		 * In 'prefer' case, non-twophase-commit capable server can be
+		 * committed in one-phase.
+		 */
+		if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+			!IsSeverCapableOfTwophaseCommit(fdw_part))
+			commit = true;
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants,
+														 lc);
+			continue;
+		}
+	}
+
+	/* All done if we committed all foreign transactions */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Secondary, if only one transaction is remained in the participant list
+	 * and we didn't modified the local data we can commit it without
+	 * preparation.
+	 */
+	if (list_length(FdwXactParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		/* All foreign transaction must be committed */
+		list_free(FdwXactParticipants);
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign trasactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell		*lcell;
+	TransactionId	xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXactRslvState 	*state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the FDWXACT_STATUS_PREPARING
+		 * status. Registration persists this information to the disk and logs
+		 * (that way relaying it on standby). Thus in case we loose connectivity
+		 * to the foreign server or crash ourselves, we will remember that we
+		 * might have prepared transaction on the foreign server and try to
+		 * resolve it when connectivity is restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-preapred
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fdwxact;
+	FdwXactOnDiskData	*fdwxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdwxact_id)
+{
+	int i;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is beging resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if the current transaction
+ * modified data on two or more servers in FdwXactParticipants and
+ * local server itself.
+ */
+static bool
+is_foreign_twophase_commit_required(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	ForeignTwophaseCommitIsRequired = true;
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * Also, exit if the transaction itself has no foreign transaction
+	 * participants.
+	 */
+	if (FdwXactParticipants == NIL && wait_xid == MyPgXact->xid)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		foreach (lcell, FdwXactParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+										  PGPROC *waiter)
+{
+	FdwXact	fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping	*usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	bool			is_commit = false;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 *  and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+		is_commit = true;
+	}
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+	{
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		is_commit = false;
+	}
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is
+	 * about to be committed or aborted. This should not happen except for one
+	 * case where the local transaction is prepared and this foreign transaction
+	 * is being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool				is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-dbout transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdwxact_exists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	return fdwxact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List		*all_fdwxacts;
+	ListCell	*lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respecitively.
+ */
+static List*
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int i;
+	List	*fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact	fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	*id;
+	int		id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * idenetifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;						/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdwxacts,
+							 serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, &read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid =
+		XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid  != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							  fdwxact_data->serverid, fdwxact_data->userid,
+							  fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know
+	 * the xact status right now. Resolver will set it later based on
+	 * the status of local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;	/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	FdwXactRslvState	*state;
+	FdwXactStatus		prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000..45fb530
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,644 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always starts when the backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find FdwXact
+		 * entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int unused_slot;
+	int i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (unused_slot > max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	*resolver_dbs;	/* DBs resolver's running on */
+	HTAB	*fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL	ctl;
+	HASH_SEQ_STATUS status;
+	Oid		*entry;
+	bool	launched;
+	int		i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign trasaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		 /* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000..9298877
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,344 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+bool foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC			*waiter = NULL;
+		TransactionId	waitXid = InvalidTransactionId;
+		TimestampTz		resolutionTs = -1;
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if	(resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransactionAndReleaseWaiter(MyDatabaseId, waitXid,
+													  waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long	sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..742e825 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -8,9 +8,9 @@ subdir = src/backend/access/rmgrdesc
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
-	   gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
-	   mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
-	   smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o fdwxactdesc.o \
+	genericdesc.o  gindesc.o gistdesc.o hashdesc.o heapdesc.o \
+	logicalmsgdesc.o mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o \
+	seqdesc.o smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000..fe0cef9
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3..1d4e1c8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..f15c83a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,9 +12,9 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
-	xact.o xlog.o xlogarchive.o xlogfuncs.o \
+OBJS = clog.o commit_ts.o generic_xlog.o multixact.o \
+	parallel.o rmgr.o slru.o subtrans.o timeline.o transam.o twophase.o \
+	twophase_rmgr.o varsup.o xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..8b360b1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -9,6 +9,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
 #include "access/generic_xlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 477709b..82a0cb3 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -852,6 +853,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 }
 
 /*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
+/*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
  *
@@ -2318,6 +2348,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2377,6 +2413,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index f594d33..0d5f9d2 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1216,6 +1217,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1224,6 +1226,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1262,12 +1265,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1425,6 +1429,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2084,6 +2096,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2243,6 +2258,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2330,6 +2346,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2524,6 +2542,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2729,6 +2748,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e651a84..3716fbf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
@@ -5249,6 +5250,7 @@ BootStrapXLOG(void)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6182,6 +6184,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6704,14 +6709,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6903,7 +6909,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7399,6 +7408,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7717,6 +7727,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -8992,6 +9005,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9425,8 +9439,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9444,6 +9460,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9460,6 +9477,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9665,6 +9683,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9864,6 +9883,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..342dd6a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -332,6 +332,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -815,6 +818,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3aeef30..43bb9ae 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2860,9 +2860,16 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	}
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index f96c278..621d70d 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1102,6 +1104,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1421,6 +1435,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+	/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
@@ -1573,6 +1596,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d23f292..7cdf8e1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,7 +13,9 @@
  */
 #include "postgres.h"
 
+
 #include "access/table.h"
+#include "access/fdwxact.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
@@ -944,7 +946,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1da..eb7450c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,10 +226,33 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
+	}
+
 	return scanstate;
 }
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 778ff27..ed4af1a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -38,6 +38,7 @@
 #include "postgres.h"
 
 #include "access/heapam.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -550,6 +552,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -778,6 +784,10 @@ ldelete:;
 									&tmfd,
 									changingPart);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -1325,6 +1335,10 @@ lreplace:;
 									true /* wait for commit */ ,
 									&tmfd, &lockmode, &update_indexes);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -2386,6 +2400,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index c917ec4..2780ed5 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMapping - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,20 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction &&
+		 !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction &&
+		 routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data-wrapper must support both commit and rollback routine or either");
+
+	if (routine->PrepareForeignTransaction &&
+		(!routine->CommitForeignTransaction ||
+		 !routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index b66b517..517169b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -15,6 +15,8 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f..51c3789 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3652,6 +3652,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3855,6 +3861,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4070,6 +4081,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 62dc93d..08eb99f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -900,6 +902,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -975,12 +981,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5315d93..c932167 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -154,6 +154,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d7335..1491bc6 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(int port)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 8abcfdf..b1561b2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -91,6 +91,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -246,6 +248,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1310,6 +1313,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1375,6 +1379,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1425,6 +1430,15 @@ GetOldestXmin(Relation rel, int flags)
 		result = replication_slot_xmin;
 
 	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
+	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
 	 * decoding possible. We need to do so if we're computing the global limit
@@ -3014,6 +3028,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843..0b8a487 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock					45
+FdwXactResolverLock			46
+FdwXactResolutionLock			47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 498373f..dc77509 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -422,6 +423,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -823,6 +828,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e8d8e6f..10ee130 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3007,6 +3009,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 90ffd89..7dfafe3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -397,6 +398,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
+/*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
  */
@@ -718,6 +738,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2351,6 +2377,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4361,6 +4433,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
+	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0fc23e3..6a3887d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -341,6 +343,20 @@
 
 
 #------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
+#------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 33ac627..328b857 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d..6132c72 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -210,6 +210,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index ff17804..58e9630 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 159a30b..43144fa 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -711,6 +711,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -918,6 +919,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..b616cea 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000..147d41c
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,165 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											   without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER,		/* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										   twophase commit */
+} ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID,
+	FDWXACT_STATUS_INITIAL,
+	FDWXACT_STATUS_PREPARING,		/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,		/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,		/* foreign prepared transaction is to
+									 * be committed */
+	FDWXACT_STATUS_ABORTING,		/* foreign prepared transaction is to be
+									 * aborted */
+	FDWXACT_STATUS_RESOLVED
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact			fdwxact_free_next;	/* Next free FdwXact entry */
+
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes
+									 * place */
+	Oid				userid;			/* user who initiated the foreign
+									 * transaction */
+	Oid				umid;
+	bool			indoubt;		/* Is an in-doubt transaction? */
+	slock_t			mutex;			/* Protect the above fields */
+
+	/* The status of the foreign transaction, protected by FdwXactLock */
+	FdwXactStatus 	status;
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	char	*fdwxact_id;
+
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	int		flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdwxact_exists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+													  PGPROC *waiter);
+extern bool FdwXactResolveInDoubtTransactions(Oid dbid);
+extern PGPROC *FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p);
+extern void FdwXactCleanupAtProcExit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+extern bool FdwXactWaiterExists(Oid dbid);
+
+#endif   /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000..dd0f5d1
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLauncherRequestToLaunchForRetry(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif	/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000..2607654
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000..39ca66b
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000..55fc970
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 3c0db2c..5798b4c 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index b9a531c..8238723 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 6f4013e..0cee715 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -109,6 +109,13 @@ extern int	MyXactFlags;
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
 /*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
+/*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
 typedef enum
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 3f0de66..5c50677 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -228,6 +228,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index ff98d9e..773846d 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cf1f409..e03b5ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5178,6 +5178,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '6053', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5891,6 +5898,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '6050', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '6051', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '6052', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6009,6 +6034,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '6054',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 8226860..f6592ee 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
@@ -248,7 +260,6 @@ typedef struct FdwRoutine
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
 } FdwRoutine;
 
-
 /* Functions in foreign/foreign.c */
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern Oid	GetForeignServerIdByRelId(Oid relid);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 4de157c..91c2276 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d8..d82d8f7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -776,6 +776,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,7 +855,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -933,6 +937,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db..2eab5a9 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -21,6 +21,7 @@
 #include "storage/lock.h"
 #include "storage/pg_sema.h"
 #include "storage/proclist_types.h"
+#include "datatype/timestamp.h"
 
 /*
  * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
@@ -153,6 +154,16 @@ struct PGPROC
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
 	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
+	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
 	 * their lock.
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index da8b672..04f9c8c 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -124,4 +126,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index d68976f..d5fec50 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..c862e0e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1341,6 +1341,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1841,6 +1849,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.10.5

#26Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#25)

On Wed, Sep 04, 2019 at 12:44:20PM +0900, Masahiko Sawada wrote:

I forgot to include some new header files. Attached the updated patches.

No reviews since and the patch does not apply anymore. I am moving it
to next CF, waiting on author.
--
Michael

#27Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#26)
5 attachment(s)
Transactions involving multiple postgres foreign servers, take 2

Hello.

This is the reased (and a bit fixed) version of the patch. This
applies on the master HEAD and passes all provided tests.

I took over this work from Sawada-san. I'll begin with reviewing the
current patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v26-0001-Keep-track-of-writing-on-non-temporary-relation.patchtext/x-patch; charset=us-asciiDownload
From 733f1e413ef2b2fe1d3ecba41eb4cd8e355ab826 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 16:59:47 +0900
Subject: [PATCH v26 1/5] Keep track of writing on non-temporary relation

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e3eb9d7b90..cd91f9c8a8 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -587,6 +587,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -938,6 +942,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1447,6 +1455,10 @@ lreplace:;
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	if (canSetTag)
 		(estate->es_processed)++;
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 9d2899dea1..cb5c4935d2 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.23.0

v26-0002-Support-atomic-commit-among-multiple-foreign-ser.patchtext/x-patch; charset=us-asciiDownload
From d21c72a7db85c2211504f60fca8d39c0bd0ee5a6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:00:50 +0900
Subject: [PATCH v26 2/5] Support atomic commit among multiple foreign servers.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 +
 src/backend/access/fdwxact/fdwxact.c          | 2816 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  644 ++++
 src/backend/access/fdwxact/resolver.c         |  344 ++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   27 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |   18 +
 src/backend/foreign/foreign.c                 |   57 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  165 +
 src/include/access/fdwxact_launcher.h         |   29 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   66 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 55 files changed, 4917 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..46ccb7eeae
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itselft to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rolbacked the resolver process wake up the waiter.
+
+
+API Contract With Transaction Management Callback Functions
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForiegnTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the trasaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transactio in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (i.g. preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..058a416f81
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2816 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaciton API and
+ * RollbackForeignTransactionAPI. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similary if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by pg_resovle_fdwxact()
+ * function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to dink.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry
+	 * is not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char			*fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool			modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact_fn;
+	CommitForeignTransaction_function	commit_foreign_xact_fn;
+	RollbackForeignTransaction_function	rollback_foreign_xact_fn;
+	GetPrepareId_function				get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid,	void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(void);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation		rel;
+	Oid				serverid;
+	Oid				userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	UserMapping			*user_mapping;
+	MemoryContext		old_ctx;
+	FdwRoutine			*routine;
+	ListCell	   		*lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks.
+	 */
+	if (!routine->CommitForeignTransaction ||
+		!routine->RollbackForeignTransaction)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we touched the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * If foreign twophase commit is required, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ *  prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	ListCell	*lc = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * we require all modified server have to be capable of two-phase
+	 * commit protocol.
+	 */
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false
+	 * if foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required();
+
+	/*
+	 * Firstly, we consider to commit foreign transactions in one-phase.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool	commit = false;
+
+		/* Can commit in one-phase if two-phase commit is not requried */
+		if (!need_twophase_commit)
+			commit = true;
+
+		/* Non-modified foreign transaction always can be committed in one-phase */
+		if (!fdw_part->modified)
+			commit = true;
+
+		/*
+		 * In 'prefer' case, non-twophase-commit capable server can be
+		 * committed in one-phase.
+		 */
+		if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+			!IsSeverCapableOfTwophaseCommit(fdw_part))
+			commit = true;
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants,
+														 lc);
+			continue;
+		}
+	}
+
+	/* All done if we committed all foreign transactions */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Secondary, if only one transaction is remained in the participant list
+	 * and we didn't modified the local data we can commit it without
+	 * preparation.
+	 */
+	if (list_length(FdwXactParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		/* All foreign transaction must be committed */
+		list_free(FdwXactParticipants);
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign trasactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell		*lcell;
+	TransactionId	xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXactRslvState 	*state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the FDWXACT_STATUS_PREPARING
+		 * status. Registration persists this information to the disk and logs
+		 * (that way relaying it on standby). Thus in case we loose connectivity
+		 * to the foreign server or crash ourselves, we will remember that we
+		 * might have prepared transaction on the foreign server and try to
+		 * resolve it when connectivity is restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-preapred
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fdwxact;
+	FdwXactOnDiskData	*fdwxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdwxact_id)
+{
+	int i;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is beging resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if the current transaction
+ * modified data on two or more servers in FdwXactParticipants and
+ * local server itself.
+ */
+static bool
+is_foreign_twophase_commit_required(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	ForeignTwophaseCommitIsRequired = true;
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * Also, exit if the transaction itself has no foreign transaction
+	 * participants.
+	 */
+	if (FdwXactParticipants == NIL && wait_xid == MyPgXact->xid)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		foreach (lcell, FdwXactParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+										  PGPROC *waiter)
+{
+	FdwXact	fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping	*usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 *  and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is
+	 * about to be committed or aborted. This should not happen except for one
+	 * case where the local transaction is prepared and this foreign transaction
+	 * is being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool				is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-dbout transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdwxact_exists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	return fdwxact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List		*all_fdwxacts;
+	ListCell	*lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respecitively.
+ */
+static List*
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int i;
+	List	*fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact	fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	*id;
+	int		id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * idenetifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;						/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdwxacts,
+							 serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									&read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid =
+		XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid  != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							  fdwxact_data->serverid, fdwxact_data->userid,
+							  fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know
+	 * the xact status right now. Resolver will set it later based on
+	 * the status of local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;	/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	FdwXactRslvState	*state;
+	FdwXactStatus		prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..45fb530916
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,644 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always starts when the backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find FdwXact
+		 * entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int unused_slot;
+	int i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (unused_slot > max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	*resolver_dbs;	/* DBs resolver's running on */
+	HTAB	*fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL	ctl;
+	HASH_SEQ_STATUS status;
+	Oid		*entry;
+	bool	launched;
+	int		i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign trasaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		 /* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..9298877f10
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,344 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+bool foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC			*waiter = NULL;
+		TransactionId	waitXid = InvalidTransactionId;
+		TimestampTz		resolutionTs = -1;
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if	(resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransactionAndReleaseWaiter(MyDatabaseId, waitXid,
+													  waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long	sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..fe0cef9472
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3042..1d4e1c82e1 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 529976885f..2c9af36bbb 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2262,6 +2292,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2321,6 +2357,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5353b6ab0b..5b67056c65 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1218,6 +1219,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1226,6 +1228,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1264,12 +1267,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1427,6 +1431,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2086,6 +2098,10 @@ CommitTransaction(void)
 			break;
 	}
 
+ 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2246,6 +2262,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2333,6 +2350,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2527,6 +2546,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2732,6 +2752,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6bc1a6b46d..428a974c51 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -5246,6 +5247,7 @@ BootStrapXLOG(void)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6189,6 +6191,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6729,14 +6734,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6928,7 +6934,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7424,6 +7433,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7754,6 +7764,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9029,6 +9042,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9462,8 +9476,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9481,6 +9497,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9497,6 +9514,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9702,6 +9720,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9901,6 +9920,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f7800f01a6..b4c1cce1f0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -332,6 +332,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -818,6 +821,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 42a147b67d..e3caef7ef9 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2857,8 +2857,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index 766c9f95c8..43bbe8356d 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1419,6 +1433,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1572,6 +1595,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d23f292cb0..690717c34e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -944,7 +945,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..3ac56d1678 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cd91f9c8a8..c1ab3d829a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -549,6 +551,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -777,6 +783,10 @@ ldelete:;
 									&tmfd,
 									changingPart);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -1323,6 +1333,10 @@ lreplace:;
 									true /* wait for commit */ ,
 									&tmfd, &lockmode, &update_indexes);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -2382,6 +2396,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index c917ec40ff..0b17505aac 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,20 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction &&
+		 !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction &&
+		 routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data-wrapper must support both commit and rollback routine or either");
+
+	if (routine->PrepareForeignTransaction &&
+		(!routine->CommitForeignTransaction ||
+		 !routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5f8a007e73..0a8890a984 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -14,6 +14,8 @@
 
 #include <unistd.h>
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -129,6 +131,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index fabcf31de8..0d3932c2cf 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3650,6 +3650,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3853,6 +3859,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4068,6 +4079,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9ff2832c00..f92be8387d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -984,12 +990,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index bc532d027b..6269f384af 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 4829953ee6..6bde7a735a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 13bcbe77de..020eb76b6a 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -93,6 +93,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -248,6 +250,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1312,6 +1315,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1377,6 +1381,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1426,6 +1431,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3128,6 +3142,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..adb276370c 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index fff0628e58..af5e418a03 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3b85e48333..a0f8498862 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3029,6 +3031,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ba74bf9f7d..d38c33b64c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -399,6 +400,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -725,6 +745,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2370,6 +2396,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4413,6 +4485,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9541879c1f..22e014aecd 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -341,6 +343,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index f08a49c9dd..dd8878025b 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 1f6d8939be..49dc5a519f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -210,6 +210,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 19e21ab491..9ae3bfe4dd 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -301,6 +301,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2e286f6339..c5ee22132e 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..147d41c708
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,165 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											   without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER,		/* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										   twophase commit */
+} ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID,
+	FDWXACT_STATUS_INITIAL,
+	FDWXACT_STATUS_PREPARING,		/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,		/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,		/* foreign prepared transaction is to
+									 * be committed */
+	FDWXACT_STATUS_ABORTING,		/* foreign prepared transaction is to be
+									 * aborted */
+	FDWXACT_STATUS_RESOLVED
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact			fdwxact_free_next;	/* Next free FdwXact entry */
+
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes
+									 * place */
+	Oid				userid;			/* user who initiated the foreign
+									 * transaction */
+	Oid				umid;
+	bool			indoubt;		/* Is an in-doubt transaction? */
+	slock_t			mutex;			/* Protect the above fields */
+
+	/* The status of the foreign transaction, protected by FdwXactLock */
+	FdwXactStatus 	status;
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	char	*fdwxact_id;
+
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	int		flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdwxact_exists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid,
+													  PGPROC *waiter);
+extern bool FdwXactResolveInDoubtTransactions(Oid dbid);
+extern PGPROC *FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p);
+extern void FdwXactCleanupAtProcExit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+extern bool FdwXactWaiterExists(Oid dbid);
+
+#endif   /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..dd0f5d16ff
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLauncherRequestToLaunchForRetry(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif	/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..2607654024
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..39ca66beef
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..55fc970b69
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 3c0db2ccf5..5798b4cd99 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 02b5315c43..e8c094d708 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index cb5c4935d2..a75e6998f0 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index e295dc65fb..d1ce20242f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index cf7d4485e9..f2174a0208 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ac8f64b219..1072c38aa6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5184,6 +5184,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '9705', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5897,6 +5904,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6015,6 +6040,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..c7b33d72ec 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 4de157c19c..91c2276915 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..d82d8f7abc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -776,6 +776,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,7 +855,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -933,6 +937,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db725..c802201193 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -152,6 +153,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 8f67b860e7..deb293c1a9 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index d68976fafa..d5fec50969 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c9cc569404..ed229d5a67 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1341,6 +1341,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1841,6 +1849,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.23.0

v26-0003-Documentation-update.patchtext/x-patch; charset=us-asciiDownload
From 3363abd531595233fb59e0ab6078a011ab8060e9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:01:08 +0900
Subject: [PATCH v26 3/5] Documentation update.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 doc/src/sgml/catalogs.sgml                | 145 +++++++++++++
 doc/src/sgml/config.sgml                  | 146 ++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 158 +++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 ++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 55694c4368..1b720da03d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8267,6 +8267,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>open cursors</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
@@ -9712,6 +9717,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 53ac14490a..69778750f3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4378,7 +4378,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8818,6 +8817,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..350b1afe68
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..dd0358ef22 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..80a87fa5d1 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..b9a918b9ee 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22355,6 +22355,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a3c5f86b7e..65938e81ca 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -368,6 +368,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1236,6 +1244,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>CheckpointerMain</literal></entry>
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
         <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
@@ -1459,6 +1479,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>SyncRep</literal></entry>
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
         <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
@@ -2359,6 +2383,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e59cba7997..dee3f72f7e 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 1c19e863d2..3f4c806ed1 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v26-0004-postgres_fdw-supports-atomic-commit-APIs.patchtext/x-patch; charset=us-asciiDownload
From 84f81fdcb2bd823e34edba79c81c29871d7906fb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:01:15 +0900
Subject: [PATCH v26 4/5] postgres_fdw supports atomic commit APIs.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 604 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 265 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 120 +++-
 doc/src/sgml/postgres-fdw.sgml                |  45 ++
 8 files changed, 822 insertions(+), 250 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 27b86a03f8..0b07e6c5cc 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2019, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -54,6 +55,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -67,17 +69,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -89,24 +87,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 									 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
-
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -126,7 +126,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -134,12 +133,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -153,6 +146,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -180,6 +188,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -188,6 +197,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -198,6 +208,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -205,11 +224,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -412,7 +459,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -639,193 +686,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -842,10 +702,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -856,6 +712,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1190,3 +1050,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 48282ab151..0ee91a49ac 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -179,15 +198,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8781,16 +8802,226 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index bdc21b36d1..9c63f0aa3b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index ea052872c3..d7ba45c8d2 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1c5c37b783..572077c57c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2480,9 +2503,98 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 1d4bafd9f0..362f7be9e3 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -441,6 +441,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -468,6 +505,14 @@
    managed by creating corresponding remote savepoints.
   </para>
 
+  <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
   <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
-- 
2.23.0

v26-0005-Add-regression-tests-for-atomic-commit.patchtext/x-patch; charset=us-asciiDownload
From 639d9156323594430ec4b2217a95bfcf08195e9d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:01:26 +0900
Subject: [PATCH v26 5/5] Add regression tests for atomic commit.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index e66e69521f..b17429f501 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 297b8fbd6f..82a1e7d541 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2336,9 +2336,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2353,7 +2356,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

#28Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#27)
5 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Hello.

This is the reased (and a bit fixed) version of the patch. This
applies on the master HEAD and passes all provided tests.

I took over this work from Sawada-san. I'll begin with reviewing the
current patch.

The previous patch set is no longer applied cleanly to the current
HEAD. I've updated and slightly modified the codes.

This patch set has been marked as Waiting on Author for a long time
but the correct status now is Needs Review. The patch actually was
updated and incorporated all review comments but they was not rebased
actively.

The mail[1]/messages/by-id/CAD21AoDn98axH1bEoMnte+S7WWR=nsmOpjz1WGH-NvJi4aLu3Q@mail.gmail.com I posted before would be helpful to understand the current
patch design and there are README in the patch and a wiki page[2]https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions.

I've marked this as Needs Review.

Regards,

[1]: /messages/by-id/CAD21AoDn98axH1bEoMnte+S7WWR=nsmOpjz1WGH-NvJi4aLu3Q@mail.gmail.com
[2]: https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v27-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v27-0005-Add-regression-tests-for-atomic-commit.patchDownload
From 42eea5f76a41e582e66ac532645fd95faa244bb2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:01:26 +0900
Subject: [PATCH v27 5/5] Add regression tests for atomic commit.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 92bd28dc5a..de8a292bba 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2336,9 +2336,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2353,7 +2356,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v27-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v27-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From df8e4b75e049d274188ad412dfd419d23c0d881a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 16:59:47 +0900
Subject: [PATCH v27 1/5] Keep track of writing on non-temporary relation

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 59d1a31c97..c0a15c3412 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -587,6 +587,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -938,6 +942,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1447,6 +1455,10 @@ lreplace:;
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
+	/* Make note that we've wrote on non-temprary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	if (canSetTag)
 		(estate->es_processed)++;
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7ee04babc2..a04fc70326 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.23.0

v27-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v27-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 1c592918acd2da8dab0f5ffba167a70fcc39be5d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:00:50 +0900
Subject: [PATCH v27 2/5] Support atomic commit among multiple foreign servers.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 +
 src/backend/access/fdwxact/fdwxact.c          | 2815 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  644 ++++
 src/backend/access/fdwxact/resolver.c         |  343 ++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |   18 +
 src/backend/foreign/foreign.c                 |   57 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  164 +
 src/include/access/fdwxact_launcher.h         |   29 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   66 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 55 files changed, 4913 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..46ccb7eeae
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itselft to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rolbacked the resolver process wake up the waiter.
+
+
+API Contract With Transaction Management Callback Functions
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForiegnTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the trasaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transactio in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (i.g. preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..66ba4fe953
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2815 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers automically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaciton API and
+ * RollbackForeignTransactionAPI. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similary if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by pg_resovle_fdwxact()
+ * function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to dink.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry
+	 * is not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char			*fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool			modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function	prepare_foreign_xact_fn;
+	CommitForeignTransaction_function	commit_foreign_xact_fn;
+	RollbackForeignTransaction_function	rollback_foreign_xact_fn;
+	GetPrepareId_function				get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int	max_prepared_foreign_xacts = 0;
+int	max_foreign_xact_resolvers = 0;
+int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid,	void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(void);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation		rel;
+	Oid				serverid;
+	Oid				userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant	*fdw_part;
+	ForeignServer 		*foreign_server;
+	UserMapping			*user_mapping;
+	MemoryContext		old_ctx;
+	FdwRoutine			*routine;
+	ListCell	   		*lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks.
+	 */
+	if (!routine->CommitForeignTransaction ||
+		!routine->RollbackForeignTransaction)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we touched the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * If foreign twophase commit is required, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ *  prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	ListCell	*lc = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * we require all modified server have to be capable of two-phase
+	 * commit protocol.
+	 */
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false
+	 * if foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required();
+
+	/*
+	 * Firstly, we consider to commit foreign transactions in one-phase.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool	commit = false;
+
+		/* Can commit in one-phase if two-phase commit is not requried */
+		if (!need_twophase_commit)
+			commit = true;
+
+		/* Non-modified foreign transaction always can be committed in one-phase */
+		if (!fdw_part->modified)
+			commit = true;
+
+		/*
+		 * In 'prefer' case, non-twophase-commit capable server can be
+		 * committed in one-phase.
+		 */
+		if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+			!IsSeverCapableOfTwophaseCommit(fdw_part))
+			commit = true;
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants,
+														 lc);
+			continue;
+		}
+	}
+
+	/* All done if we committed all foreign transactions */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Secondary, if only one transaction is remained in the participant list
+	 * and we didn't modified the local data we can commit it without
+	 * preparation.
+	 */
+	if (list_length(FdwXactParticipants) == 1 &&
+		(MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		/* All foreign transaction must be committed */
+		list_free(FdwXactParticipants);
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign trasactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell		*lcell;
+	TransactionId	xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepread foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lcell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell);
+		FdwXactRslvState 	*state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the FDWXACT_STATUS_PREPARING
+		 * status. Registration persists this information to the disk and logs
+		 * (that way relaying it on standby). Thus in case we loose connectivity
+		 * to the foreign server or crash ourselves, we will remember that we
+		 * might have prepared transaction on the foreign server and try to
+		 * resolve it when connectivity is restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before persisting
+		 * the information to the disk and crash in-between these two steps,
+		 * we will forget that we prepared the transaction on the foreign server
+		 * and will not be able to resolve it after the crash. Hence persist
+		 * first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-preapred
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact				fdwxact;
+	FdwXactOnDiskData	*fdwxact_file_data;
+	MemoryContext		old_context;
+	int					data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							fdw_part->usermapping->userid,
+							fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				Oid umid, char *fdwxact_id)
+{
+	int i;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
+								   xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is beging resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("failed to find entry for xid %u, foreign server %u, and user %u",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the
+		 * WAL record is inserted could complete without fsync'ing our
+		 * state file.  (This is essentially the same kind of race condition
+		 * as the COMMIT-to-clog-write case that RecordTransactionCommit
+		 * uses delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and
+		 * remove the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/*
+		 * Now we can mark ourselves as out of the commit critical section: a
+		 * checkpoint starting after this will certainly see the gxact as a
+		 * candidate for fsyncing.
+		 */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set FdwXactAtomicCommitReady to true if the current transaction
+ * modified data on two or more servers in FdwXactParticipants and
+ * local server itself.
+ */
+static bool
+is_foreign_twophase_commit_required(void)
+{
+	ListCell*	lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		++nserverswritten;
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	ForeignTwophaseCommitIsRequired = true;
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int	i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell *cell;
+	int		n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant	*fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char		*new_status = NULL;
+	const char	*old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * Also exit if the transaction itself has no foreign transaction
+	 * participants.
+	 */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction
+	 * resolution.
+	 */
+	if (update_process_title)
+	{
+		int len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0';	/* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The latter
+		 * would lead the client to believe that the distributed transaction
+		 * aborted, which is not true: it's already committed locally. The
+		 * former is no good either: the client has requested committing a
+		 * distributed transaction, and is entitled to assume that a acknowledged
+		 * commit is also commit on all foreign servers, which might not be
+		 * true. So in this case we issue a WARNING (which some clients may
+		 * be able to interpret) and shut off further output. We do NOT reset
+		 * PorcDiePending, so that the process will die after the commit is
+		 * cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve them
+		 * later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lcell;
+
+	if (!is_commit)
+	{
+		foreach (lcell, FdwXactParticipants)
+		{
+			FdwXactParticipant	*fdw_part = lfirst(lcell);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required() &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransaction(Oid dbid, TransactionId xid, PGPROC *waiter)
+{
+	FdwXact	fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping	*usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter
+	 * could already be detached if user cancelled to wait before
+	 * resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId	wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/* Wake up the waiter only when we have set state and removed from queue */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 *  and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared
+	 * foreign transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared
+	 * foreign transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign
+	 * transaction is not prepared on the foreign server. This
+	 * can happen when transaction failed after registered this
+	 * entry but before actual preparing on the foreign server.
+	 * So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is
+	 * about to be committed or aborted. This should not happen except for one
+	 * case where the local transaction is prepared and this foreign transaction
+	 * is being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer		*server;
+	ForeignDataWrapper	*fdw;
+	FdwRoutine			*fdw_routine;
+	bool				is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-dbout transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+fdwxact_exists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	*fdwxact_list;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	return fdwxact_list != NIL;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List		*all_fdwxacts;
+	ListCell	*lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respecitively.
+ */
+static List*
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int i;
+	List	*fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact	fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record
+		 * in FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	*id;
+	int		id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * idenetifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifer \"%s\" is too long",
+						id),
+				 errdetail("foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint'S redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;						/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence fo long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked
+	 * invalid, because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+			  (errmsg_plural("%u foreign transaction state file was written "
+							 "for long-running prepared transactions",
+							 "%u foreign transaction state files were written "
+							 "for long-running prepared transactions",
+							 serialized_fdwxacts,
+							 serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									&read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+		   errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, lsn, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not read foreign transaction state from xlog at %X/%X",
+			   (uint32) (lsn >> 32),
+			   (uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+		errmsg("could not recreate foreign transaction state file \"%s\": %m",
+			   path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not write foreign transcation state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			  errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId	origNextXid =
+		XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	*buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+			   errmsg("could not open FDW transaction state file \"%s\": %m",
+					  path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid  != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char		*buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The
+	 * status of the transaction is set as preparing, since we do not
+	 * know the exact status right now. Resolver will set it later
+	 * based on the status of local transaction which prepared this
+	 * foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							  fdwxact_data->serverid, fdwxact_data->userid,
+							  fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know
+	 * the xact status right now. Resolver will set it later based on
+	 * the status of local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;	/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact	fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+		char	*buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}	WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+	FdwXact			fdwxact;
+	FdwXactRslvState	*state;
+	FdwXactStatus		prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId	xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid				serverid = PG_GETARG_OID(1);
+	Oid				userid = PG_GETARG_OID(2);
+	FdwXact			fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..041ac3871f
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,644 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int	slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int	save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz	last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz	now;
+		long	wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int		rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a foreign_xact_resolution_retry_interval
+		 * but always starts when the backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested
+			 * but not running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we
+			 * should retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool	found = false;
+	int		i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the
+	 * same database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find FdwXact
+		 * entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int unused_slot = -1;
+	int i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign trasanction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to wait
+	 * until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	*resolver_dbs;	/* DBs resolver's running on */
+	HTAB	*fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL	ctl;
+	HASH_SEQ_STATUS status;
+	Oid		*entry;
+	bool	launched;
+	int		i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/*
+ * FdwXactLauncherRegister
+ *		Register a background worker running the foreign transaction
+ *      launcher.
+ */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign trasaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		 /* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver	*resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t	pid;
+		Oid		dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..2c41e58b9e
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,343 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int foreign_xact_resolution_retry_interval;
+int foreign_xact_resolver_timeout = 60 * 1000;
+bool foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC			*waiter = NULL;
+		TransactionId	waitXid = InvalidTransactionId;
+		TimestampTz		resolutionTs = -1;
+		int			rc;
+		TimestampTz	now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if	(resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransaction(MyDatabaseId, waitXid, waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long	sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long	sec_to_timeout;
+		int		microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..fe0cef9472
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 6f7ee0c947..80d0972209 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2262,6 +2292,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2321,6 +2357,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 017f03b6d8..2722d532d7 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1218,6 +1219,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1226,6 +1228,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1264,12 +1267,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1427,6 +1431,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2086,6 +2098,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2246,6 +2261,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2333,6 +2349,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2527,6 +2545,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2732,6 +2751,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7f4f784c0e..96ad310765 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -5245,6 +5246,7 @@ BootStrapXLOG(void)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6184,6 +6186,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6724,14 +6729,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6923,7 +6929,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7419,6 +7428,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7749,6 +7759,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9024,6 +9037,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9457,8 +9471,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9476,6 +9492,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9492,6 +9509,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9697,6 +9715,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9896,6 +9915,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c9e75f4370..d6b0159128 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -332,6 +332,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -824,6 +827,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..408a5085f0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2858,8 +2858,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index f197869752..767dbcb3db 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1419,6 +1433,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (fdwxact_exists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1572,6 +1595,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c13b1d3501..1dc61fbdea 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -937,7 +938,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..29f376e48c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c0a15c3412..a9d223a534 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -549,6 +551,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temprary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -777,6 +783,10 @@ ldelete:;
 									&tmfd,
 									changingPart);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -1323,6 +1333,10 @@ lreplace:;
 									true /* wait for commit */ ,
 									&tmfd, &lockmode, &update_indexes);
 
+		/* Make note that we've wrote on non-temprary relation */
+		if (RelationNeedsWAL(resultRelationDesc))
+			MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 		switch (result)
 		{
 			case TM_SelfModified:
@@ -2382,6 +2396,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..15ec09e3ba 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,20 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction &&
+		 !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction &&
+		 routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data-wrapper must support both commit and rollback routine or either");
+
+	if (routine->PrepareForeignTransaction &&
+		(!routine->CommitForeignTransaction ||
+		 !routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 75fc0d5d33..c8b38142a5 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486bebd..616d96215d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3645,6 +3645,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3848,6 +3854,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4066,6 +4077,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7a92dac525..8f63bf6999 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -984,12 +990,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e1dc8a651..c77ca40e1c 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c3adb2e6c0..ebe35c4788 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -93,6 +93,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -248,6 +250,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1312,6 +1315,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1377,6 +1381,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1426,6 +1431,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3128,6 +3142,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations fo the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..adb276370c 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 32df8c85a1..623c87bb6a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0a6f80963b..9b40134aa4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3030,6 +3032,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9f179a9129..4a0fcf397c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -404,6 +405,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -731,6 +751,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2413,6 +2439,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4456,6 +4528,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..859f32eb95 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -342,6 +344,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7f1534aebb..2195fb6e90 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 19e21ab491..9ae3bfe4dd 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -301,6 +301,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index f9cfeae264..a5f2aa1a09 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..5519bd908f
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,164 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											   without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER,		/* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										   twophase commit */
+} ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID,
+	FDWXACT_STATUS_INITIAL,
+	FDWXACT_STATUS_PREPARING,		/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,		/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,		/* foreign prepared transaction is to
+									 * be committed */
+	FDWXACT_STATUS_ABORTING,		/* foreign prepared transaction is to be
+									 * aborted */
+	FDWXACT_STATUS_RESOLVED
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact			fdwxact_free_next;	/* Next free FdwXact entry */
+
+	Oid				dbid;			/* database oid where to find foreign server
+									 * and user mapping */
+	TransactionId	local_xid;		/* XID of local transaction */
+	Oid				serverid;		/* foreign server where transaction takes
+									 * place */
+	Oid				userid;			/* user who initiated the foreign
+									 * transaction */
+	Oid				umid;
+	bool			indoubt;		/* Is an in-doubt transaction? */
+	slock_t			mutex;			/* Protect the above fields */
+
+	/* The status of the foreign transaction, protected by FdwXactLock */
+	FdwXactStatus 	status;
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;		/* XLOG offset of inserting this entry start */
+	XLogRecPtr	insert_end_lsn;		/* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN];		/* prepared transaction identifier */
+} FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];		/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	char	*fdwxact_id;
+
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	int		flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern bool fdwxact_exists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern void PreCommit_FdwXacts(void);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveTransaction(Oid dbid, TransactionId xid, PGPROC *waiter);
+extern bool FdwXactResolveInDoubtTransactions(Oid dbid);
+extern PGPROC *FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p);
+extern void FdwXactCleanupAtProcExit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+extern bool FdwXactWaiterExists(Oid dbid);
+
+#endif   /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..dd0f5d16ff
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLauncherRequestToLaunchForRetry(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif	/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..2607654024
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int foreign_xact_resolver_timeout;
+
+#endif		/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..39ca66beef
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif	/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..55fc970b69
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * src/include/access/resovler_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t	pid;	/* this resolver's PID, or 0 if not active */
+	Oid		dbid;	/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool	in_use;
+
+	/* Stats */
+	TimestampTz	last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t	mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	*latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch		*launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif	/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index c88dccfb8d..254a663b4d 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a04fc70326..6f1f336e31 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 087918d41d..ed4d08b4af 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fcf2a1214c..db4fa89699 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5191,6 +5191,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '9705', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5904,6 +5911,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6022,6 +6047,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index aecb6013f0..1c6cd15652 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -776,6 +776,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,7 +855,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -934,6 +938,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index d21780108b..35ffbbca93 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -152,6 +153,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 454c2df487..6010dbcdee 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 70e1e2f78d..dbb7f18e8c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1341,6 +1341,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1846,6 +1854,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.23.0

v27-0003-Documentation-update.patchapplication/octet-stream; name=v27-0003-Documentation-update.patchDownload
From ccf30999cad0ddef2a06437890d0be1c3cdf6059 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:01:08 +0900
Subject: [PATCH v27 3/5] Documentation update.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 doc/src/sgml/catalogs.sgml                | 145 +++++++++++++
 doc/src/sgml/config.sgml                  | 146 ++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 158 +++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 ++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 85ac79f07e..95695e0374 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8267,6 +8267,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>open cursors</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
@@ -9717,6 +9722,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e07dc01e80..7aebd8adf9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4398,7 +4398,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8863,6 +8862,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..350b1afe68
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..dd0358ef22 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..80a87fa5d1 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6c4359dc7b..35b6fbb319 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21766,6 +21766,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0bfd6151c4..455d423d9c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -376,6 +376,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1244,6 +1252,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>CheckpointerMain</literal></entry>
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
         <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
@@ -1467,6 +1487,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>SyncRep</literal></entry>
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
         <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
@@ -2371,6 +2395,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e59cba7997..dee3f72f7e 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 1c19e863d2..3f4c806ed1 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v27-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v27-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From d30099d21190763a9f546a595d4aba338a8dc3aa Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 5 Dec 2019 17:01:15 +0900
Subject: [PATCH v27 4/5] postgres_fdw supports atomic commit APIs.

Original Author: Masahiko Sawada <sawada.mshk@gmail.com>
---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 603 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 265 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 120 +++-
 doc/src/sgml/postgres-fdw.sgml                |  45 ++
 8 files changed, 822 insertions(+), 249 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 29c811a80b..30b9815ecc 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -55,6 +56,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -68,17 +70,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -91,23 +89,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -127,7 +128,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -135,12 +135,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -154,6 +148,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -181,6 +190,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -189,6 +199,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -199,6 +210,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -206,11 +226,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -438,7 +486,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -665,193 +713,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -868,10 +729,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -882,6 +739,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1216,3 +1077,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 84fd3ad2e0..15cb1d1ca7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8930,16 +8951,226 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2175dff824..0873d1d4b7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..43ffd4f73f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index acd7275c72..562b59056e 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2608,9 +2631,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 94992be427..3f52daa11e 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -477,6 +477,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -504,6 +541,14 @@
    managed by creating corresponding remote savepoints.
   </para>
 
+  <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
   <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
-- 
2.23.0

#29amul sul
sulamul@gmail.com
In reply to: Masahiko Sawada (#28)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

Hello.

This is the reased (and a bit fixed) version of the patch. This
applies on the master HEAD and passes all provided tests.

I took over this work from Sawada-san. I'll begin with reviewing the
current patch.

The previous patch set is no longer applied cleanly to the current
HEAD. I've updated and slightly modified the codes.

This patch set has been marked as Waiting on Author for a long time
but the correct status now is Needs Review. The patch actually was
updated and incorporated all review comments but they was not rebased
actively.

The mail[1] I posted before would be helpful to understand the current
patch design and there are README in the patch and a wiki page[2].

I've marked this as Needs Review.

Hi Sawada san,

I just had a quick look to 0001 and 0002 patch here is the few suggestions.

patch: v27-0001:

Typo: s/non-temprary/non-temporary
----

patch: v27-0002: (Note:The left-hand number is the line number in the
v27-0002 patch):

138 +PostgreSQL's the global transaction manager (GTM), as a distributed
transaction
139 +participant The registered foreign transactions are tracked until the
end of

Full stop "." is missing after "participant"

174 +API Contract With Transaction Management Callback Functions

Can we just say "Transaction Management Callback Functions";
TOBH, I am not sure that I understand this title.

203 +processing foreign transaction (i.g. preparing, committing or
aborting) the

Do you mean "i.e" instead of i.g. ?

269 + * RollbackForeignTransactionAPI. Registered participant servers are
identified

Add space before between RollbackForeignTransaction and API.

292 + * automatically so must be processed manually using by
pg_resovle_fdwxact()

Do you mean pg_resolve_foreign_xact() here?

320 + * the foreign transaction is authorized to update the fields from
its own
321 + * one.
322 +
323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK
PREPARED a

Please add asterisk '*' on line#322.

816 +static void
817 +FdwXactPrepareForeignTransactions(void)
818 +{
819 + ListCell *lcell;

Let's have this variable name as "lc" like elsewhere.

1036 + ereport(ERROR, (errmsg("could not insert a foreign
transaction entry"),
1037 + errdetail("duplicate entry with
transaction id %u, serverid %u, userid %u",
1038 + xid, serverid, userid)));
1039 + }

Incorrect formatting.

1166 +/*
1167 + * Return true and set FdwXactAtomicCommitReady to true if the
current transaction

Do you mean ForeignTwophaseCommitIsRequired instead of
FdwXactAtomicCommitReady?

3529 +
3530 +/*
3531 + * FdwXactLauncherRegister
3532 + * Register a background worker running the foreign transaction
3533 + * launcher.
3534 + */

This prolog style is not consistent with the other function in the file.

And here are the few typos:

s/conssitent/consistent
s/consisnts/consist
s/Foriegn/Foreign
s/tranascation/transaction
s/itselft/itself
s/rolbacked/rollbacked
s/trasaction/transaction
s/transactio/transaction
s/automically/automatically
s/CommitForeignTransaciton/CommitForeignTransaction
s/Similary/Similarly
s/FDWACT_/FDWXACT_
s/dink/disk
s/requried/required
s/trasactions/transactions
s/prepread/prepared
s/preapred/prepared
s/beging/being
s/gxact/xact
s/in-dbout/in-doubt
s/respecitively/respectively
s/transction/transaction
s/idenetifier/identifier
s/identifer/identifier
s/checkpoint'S/checkpoint's
s/fo/of
s/transcation/transaction
s/trasanction/transaction
s/non-temprary/non-temporary
s/resovler_internal.h/resolver_internal.h

Regards,
Amul

#30Muhammad Usama
m.usama@gmail.com
In reply to: Michael Paquier (#26)

Hi Sawada San,

I have a couple of comments on "v27-0002-Support-atomic-commit-among-multiple-foreign-ser.patch"

1- As part of the XLogReadRecord refactoring commit the signature of XLogReadRecord was changed,
so the function call to XLogReadRecord() needs a small adjustment.

i.e. In function XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
...
-       record = XLogReadRecord(xlogreader, lsn, &errormsg);
+       XLogBeginRead(xlogreader, lsn)
+       record = XLogReadRecord(xlogreader, &errormsg);

2- In register_fdwxact(..) function you are setting the XACT_FLAGS_FDWNOPREPARE transaction flag
when the register request comes in for foreign server that does not support two-phase commit regardless
of the value of 'bool modified' argument. And later in the PreCommit_FdwXacts() you just error out when
"foreign_twophase_commit" is set to 'required' only by looking at the XACT_FLAGS_FDWNOPREPARE flag.
which I think is not correct.
Since there is a possibility that the transaction might have only read from the foreign servers (not capable of
handling transactions or two-phase commit) and all other servers where we require to do atomic commit
are capable enough of doing so.
If I am not missing something obvious here, then IMHO the XACT_FLAGS_FDWNOPREPARE flag should only
be set when the transaction management/two-phase functionality is not available and "modified" argument is
true in register_fdwxact()

Thanks

Best regards
Muhammad Usama
Highgo Software (Canada/China/Pakistan)

The new status of this patch is: Waiting on Author

#31Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: amul sul (#29)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 11 Feb 2020 at 12:42, amul sul <sulamul@gmail.com> wrote:

On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Hello.

This is the reased (and a bit fixed) version of the patch. This
applies on the master HEAD and passes all provided tests.

I took over this work from Sawada-san. I'll begin with reviewing the
current patch.

The previous patch set is no longer applied cleanly to the current
HEAD. I've updated and slightly modified the codes.

This patch set has been marked as Waiting on Author for a long time
but the correct status now is Needs Review. The patch actually was
updated and incorporated all review comments but they was not rebased
actively.

The mail[1] I posted before would be helpful to understand the current
patch design and there are README in the patch and a wiki page[2].

I've marked this as Needs Review.

Hi Sawada san,

I just had a quick look to 0001 and 0002 patch here is the few suggestions.

patch: v27-0001:

Typo: s/non-temprary/non-temporary
----

patch: v27-0002: (Note:The left-hand number is the line number in the v27-0002 patch):

138 +PostgreSQL's the global transaction manager (GTM), as a distributed transaction
139 +participant The registered foreign transactions are tracked until the end of

Full stop "." is missing after "participant"

174 +API Contract With Transaction Management Callback Functions

Can we just say "Transaction Management Callback Functions";
TOBH, I am not sure that I understand this title.

203 +processing foreign transaction (i.g. preparing, committing or aborting) the

Do you mean "i.e" instead of i.g. ?

269 + * RollbackForeignTransactionAPI. Registered participant servers are identified

Add space before between RollbackForeignTransaction and API.

292 + * automatically so must be processed manually using by pg_resovle_fdwxact()

Do you mean pg_resolve_foreign_xact() here?

320 + * the foreign transaction is authorized to update the fields from its own
321 + * one.
322 +
323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a

Please add asterisk '*' on line#322.

816 +static void
817 +FdwXactPrepareForeignTransactions(void)
818 +{
819 + ListCell *lcell;

Let's have this variable name as "lc" like elsewhere.

1036 + ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
1037 + errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
1038 + xid, serverid, userid)));
1039 + }

Incorrect formatting.

1166 +/*
1167 + * Return true and set FdwXactAtomicCommitReady to true if the current transaction

Do you mean ForeignTwophaseCommitIsRequired instead of FdwXactAtomicCommitReady?

3529 +
3530 +/*
3531 + * FdwXactLauncherRegister
3532 + * Register a background worker running the foreign transaction
3533 + * launcher.
3534 + */

This prolog style is not consistent with the other function in the file.

And here are the few typos:

s/conssitent/consistent
s/consisnts/consist
s/Foriegn/Foreign
s/tranascation/transaction
s/itselft/itself
s/rolbacked/rollbacked
s/trasaction/transaction
s/transactio/transaction
s/automically/automatically
s/CommitForeignTransaciton/CommitForeignTransaction
s/Similary/Similarly
s/FDWACT_/FDWXACT_
s/dink/disk
s/requried/required
s/trasactions/transactions
s/prepread/prepared
s/preapred/prepared
s/beging/being
s/gxact/xact
s/in-dbout/in-doubt
s/respecitively/respectively
s/transction/transaction
s/idenetifier/identifier
s/identifer/identifier
s/checkpoint'S/checkpoint's
s/fo/of
s/transcation/transaction
s/trasanction/transaction
s/non-temprary/non-temporary
s/resovler_internal.h/resolver_internal.h

Thank you for reviewing the patch! I've incorporated all comments in
local branch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#32Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiko Sawada (#31)
5 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 19 Feb 2020 at 07:55, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 11 Feb 2020 at 12:42, amul sul <sulamul@gmail.com> wrote:

On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Hello.

This is the reased (and a bit fixed) version of the patch. This
applies on the master HEAD and passes all provided tests.

I took over this work from Sawada-san. I'll begin with reviewing the
current patch.

The previous patch set is no longer applied cleanly to the current
HEAD. I've updated and slightly modified the codes.

This patch set has been marked as Waiting on Author for a long time
but the correct status now is Needs Review. The patch actually was
updated and incorporated all review comments but they was not rebased
actively.

The mail[1] I posted before would be helpful to understand the current
patch design and there are README in the patch and a wiki page[2].

I've marked this as Needs Review.

Hi Sawada san,

I just had a quick look to 0001 and 0002 patch here is the few suggestions.

patch: v27-0001:

Typo: s/non-temprary/non-temporary
----

patch: v27-0002: (Note:The left-hand number is the line number in the v27-0002 patch):

138 +PostgreSQL's the global transaction manager (GTM), as a distributed transaction
139 +participant The registered foreign transactions are tracked until the end of

Full stop "." is missing after "participant"

174 +API Contract With Transaction Management Callback Functions

Can we just say "Transaction Management Callback Functions";
TOBH, I am not sure that I understand this title.

203 +processing foreign transaction (i.g. preparing, committing or aborting) the

Do you mean "i.e" instead of i.g. ?

269 + * RollbackForeignTransactionAPI. Registered participant servers are identified

Add space before between RollbackForeignTransaction and API.

292 + * automatically so must be processed manually using by pg_resovle_fdwxact()

Do you mean pg_resolve_foreign_xact() here?

320 + * the foreign transaction is authorized to update the fields from its own
321 + * one.
322 +
323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a

Please add asterisk '*' on line#322.

816 +static void
817 +FdwXactPrepareForeignTransactions(void)
818 +{
819 + ListCell *lcell;

Let's have this variable name as "lc" like elsewhere.

1036 + ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
1037 + errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
1038 + xid, serverid, userid)));
1039 + }

Incorrect formatting.

1166 +/*
1167 + * Return true and set FdwXactAtomicCommitReady to true if the current transaction

Do you mean ForeignTwophaseCommitIsRequired instead of FdwXactAtomicCommitReady?

3529 +
3530 +/*
3531 + * FdwXactLauncherRegister
3532 + * Register a background worker running the foreign transaction
3533 + * launcher.
3534 + */

This prolog style is not consistent with the other function in the file.

And here are the few typos:

s/conssitent/consistent
s/consisnts/consist
s/Foriegn/Foreign
s/tranascation/transaction
s/itselft/itself
s/rolbacked/rollbacked
s/trasaction/transaction
s/transactio/transaction
s/automically/automatically
s/CommitForeignTransaciton/CommitForeignTransaction
s/Similary/Similarly
s/FDWACT_/FDWXACT_
s/dink/disk
s/requried/required
s/trasactions/transactions
s/prepread/prepared
s/preapred/prepared
s/beging/being
s/gxact/xact
s/in-dbout/in-doubt
s/respecitively/respectively
s/transction/transaction
s/idenetifier/identifier
s/identifer/identifier
s/checkpoint'S/checkpoint's
s/fo/of
s/transcation/transaction
s/trasanction/transaction
s/non-temprary/non-temporary
s/resovler_internal.h/resolver_internal.h

Thank you for reviewing the patch! I've incorporated all comments in
local branch.

Attached the updated version patch sets that incorporated review
comments from Amul and Muhammad.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v18-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v18-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 355e6c7f38108b24c42c0bb4420e5f8a37c0f24e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 5 Dec 2019 17:01:15 +0900
Subject: [PATCH v18 4/5] postgres_fdw supports atomic commit APIs.

Authors: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 603 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 265 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 120 +++-
 doc/src/sgml/postgres-fdw.sgml                |  45 ++
 8 files changed, 822 insertions(+), 249 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 29c811a80b..30b9815ecc 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -55,6 +56,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -68,17 +70,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -91,23 +89,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -127,7 +128,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -135,12 +135,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -154,6 +148,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -181,6 +190,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -189,6 +199,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -199,6 +210,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -206,11 +226,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -438,7 +486,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -665,193 +713,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -868,10 +729,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -882,6 +739,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1216,3 +1077,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..cd871fe314 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8961,16 +8982,226 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2175dff824..0873d1d4b7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..43ffd4f73f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..ce5785c27a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 94992be427..3f52daa11e 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -477,6 +477,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -504,6 +541,14 @@
    managed by creating corresponding remote savepoints.
   </para>
 
+  <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
   <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
-- 
2.23.0

v18-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v18-0005-Add-regression-tests-for-atomic-commit.patchDownload
From 4515ea6d6d128658ce845410a7baa1d258254675 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 5 Dec 2019 17:01:26 +0900
Subject: [PATCH v18 5/5] Add regression tests for atomic commit.

Authors: Masahiko Sawada, Ashtosh Bapat
---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 9a4e52bc7b..4ab217dc98 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2319,9 +2319,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2336,7 +2339,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v18-0003-Documentation-update.patchapplication/octet-stream; name=v18-0003-Documentation-update.patchDownload
From d3b077ca06ce6a56a1eea16d08f615e53f25d64e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 5 Dec 2019 17:01:08 +0900
Subject: [PATCH v18 3/5] Documentation update.

Authors: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 145 +++++++++++++
 doc/src/sgml/config.sgml                  | 146 ++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 158 +++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 ++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a10b66569b..23570e9f2c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8155,6 +8155,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>open cursors</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
@@ -9613,6 +9618,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..2dde3185f6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4403,7 +4403,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8868,6 +8867,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..350b1afe68
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..dd0358ef22 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..80a87fa5d1 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ceda48e0fc..eaf800621f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21803,6 +21803,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 87586a7b06..e107c9599e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -376,6 +376,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1256,6 +1264,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>CheckpointerMain</literal></entry>
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
         <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
@@ -1483,6 +1503,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>SyncRep</literal></entry>
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
         <row>
          <entry morerows="2"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>
@@ -2391,6 +2415,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e59cba7997..dee3f72f7e 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 1c19e863d2..3f4c806ed1 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v18-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v18-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 7b8bad2a26a0d5b02b10a1540b4de2c32fdc2047 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 5 Dec 2019 16:59:47 +0900
Subject: [PATCH v18 1/5] Keep track of writing on non-temporary relation

Authors: Masahiko Sawada, Ahutosh Bapat
---
 src/backend/executor/nodeModifyTable.c | 16 ++++++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 22 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d71c0a4322..870a7428f1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -574,6 +574,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -612,6 +616,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -963,6 +971,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1475,6 +1487,10 @@ lreplace:;
 	if (canSetTag)
 		(estate->es_processed)++;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/* AFTER ROW UPDATE Triggers */
 	ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, slot,
 						 recheckIndexes,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7ee04babc2..a04fc70326 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.23.0

v18-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v18-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 311fd76e036e7927310567730b108ad1d25425b7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 5 Dec 2019 17:00:50 +0900
Subject: [PATCH v18 2/5] Support atomic commit among multiple foreign servers.

Authors: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 +
 src/backend/access/fdwxact/fdwxact.c          | 2832 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  641 ++++
 src/backend/access/fdwxact/resolver.c         |  343 ++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |    6 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  167 +
 src/include/access/fdwxact_launcher.h         |   29 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   66 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 55 files changed, 4916 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..c20570022c
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant. The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the transaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (e.g., preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..db2e9a4fe3
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2832 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaction API and
+ * RollbackForeignTransaction API. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similarly if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by
+ * pg_resovle_foreign_xact() function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWXACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+ *
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(bool *local_modified);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation	rel;
+	Oid			serverid;
+	Oid			userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks and is modified.
+	 */
+	if (!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction &&
+		modified)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we touched the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * If foreign twophase commit is required, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ *  prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	bool		local_modified;
+	ListCell   *lc = NULL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * we require all modified server have to be capable of two-phase commit
+	 * protocol.
+	 */
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false if
+	 * foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required(&local_modified);
+
+	/* Attempt to commit foreign transactions in one-phase */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool		commit = false;
+
+		if (!need_twophase_commit)
+		{
+			/* Can commit in one-phase if two-phase commit is not required */
+			commit = true;
+		}
+		else if (!fdw_part->modified)
+		{
+			/*
+			 * Non-modified foreign transaction always can be committed in
+			 * one-phase regardless of two-phase commit support.
+			 */
+			commit = true;
+		}
+		else if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+				 !IsSeverCapableOfTwophaseCommit(fdw_part))
+		{
+			/*
+			 * In 'prefer' case, non-twophase-commit capable server can be
+			 * committed in one-phase.
+			 */
+			commit = true;
+		}
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+		}
+	}
+
+	/* All done if we have committed all foreign transactions in one-phase */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(foreign_twophase_commit != FOREIGN_TWOPHASE_COMMIT_DISABLED);
+
+	/*
+	 * We now have only servers in the list which is capable of two-phase
+	 * commit. If the list has only one server and we didn't modify the local
+	 * data, we h can commit it in one-phase.
+	 */
+	if (list_length(FdwXactParticipants) == 1 && !local_modified)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		/* All foreign transaction must be committed */
+		list_free(FdwXactParticipants);
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	FdwXactPrepareForeignTransactions();
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign transactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState *state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will forget that we prepared the transaction on the
+		 * foreign server and will not be able to resolve it after the crash.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-prepared
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	int			i;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is being resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true and set ForeignTwophaseCommitIsRequired to true if the current
+ * transaction modifies data on two or more servers in FdwXactParticipants and
+ * local server itself. Also set *local_modified to true if the transaction modified
+ * the local data.
+ */
+static bool
+is_foreign_twophase_commit_required(bool *local_modified)
+{
+	ListCell   *lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+	{
+		++nserverswritten;
+		if (local_modified)
+			*local_modified = true;
+	}
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	ForeignTwophaseCommitIsRequired = true;
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int			i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell   *cell;
+	int			n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in
+		 * FdwXactParticipants could be used by other backend before we forget
+		 * in case where the resolver process removes the FdwXact entry and
+		 * other backend reuses it before we forget. So we need to check if
+		 * the entries are still associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * Also exit if the transaction itself has no foreign transaction
+	 * participants.
+	 */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status, false);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status, false);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool		is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required(NULL) &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List	   *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransaction(Oid dbid, TransactionId xid, PGPROC *waiter)
+{
+	FdwXact		fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping *usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter could
+	 * already be detached if user cancelled to wait before resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 * and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted. This should not happen except for one case
+	 * where the local transaction is prepared and this foreign transaction is
+	 * being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *fdw_routine;
+	bool		is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	   *fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-doubt transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	   *fdwxact_list;
+	bool		ret = false;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	if (fdwxact_list != NIL)
+		ret = true;
+
+	list_free(fdwxact_list);
+	return ret;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List	   *all_fdwxacts;
+	ListCell   *lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact		fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respectively.
+ */
+static List *
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int			i;
+	List	   *fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * identifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									&read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int			i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know the xact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int			i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+} WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	ForeignServer *server;
+	UserMapping *usermapping;
+	FdwXact		fdwxact;
+	FdwXactRslvState *state;
+	FdwXactStatus prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..e293d13562
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,641 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+	int			i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+	int			i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int			i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t		pid;
+		Oid			dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..4843aeacc9
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,343 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int			foreign_xact_resolution_retry_interval;
+int			foreign_xact_resolver_timeout = 60 * 1000;
+bool		foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC	   *waiter = NULL;
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		int			rc;
+		TimestampTz now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if (resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransaction(MyDatabaseId, waitXid, waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 5adf956f41..e8e6a5e2b5 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2263,6 +2293,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2322,6 +2358,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e3c60f23cd..405271387d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1218,6 +1219,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1226,6 +1228,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1264,12 +1267,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1427,6 +1431,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2086,6 +2098,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2246,6 +2261,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2333,6 +2349,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2527,6 +2545,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2732,6 +2751,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3813eadfb4..f69e572b1c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -5243,6 +5244,7 @@ BootStrapXLOG(void)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6182,6 +6184,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6723,14 +6728,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6922,7 +6928,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7431,6 +7440,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7761,6 +7771,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9037,6 +9050,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9470,8 +9484,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9489,6 +9505,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9505,6 +9522,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9710,6 +9728,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9909,6 +9928,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f681aafcf9..980ddbad0a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -826,6 +829,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4cb8..2293d4cbfc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2858,8 +2858,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index f197869752..6206265424 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1419,6 +1433,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1572,6 +1595,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c13b1d3501..1dc61fbdea 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -937,7 +938,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..29f376e48c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 870a7428f1..d82d32ecb4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -2411,6 +2413,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 75fc0d5d33..c8b38142a5 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 59dc4f31ab..9dce03a6e4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3645,6 +3645,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3848,6 +3854,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4066,6 +4077,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b3986bee75..897863a79c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -984,12 +990,13 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e1dc8a651..c77ca40e1c 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4a5b26c23d..5d54acaaf5 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -93,6 +93,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -248,6 +250,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1312,6 +1315,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1377,6 +1381,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1426,6 +1431,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3128,6 +3142,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..adb276370c 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index eb321f72ea..7c7edeeaaf 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0a6f80963b..9b40134aa4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3030,6 +3032,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8228e1f390..9304ebbd76 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -424,6 +425,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -760,6 +780,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2448,6 +2474,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4491,6 +4563,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e1048c0047..859f32eb95 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -342,6 +344,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#foreign_twophase_commit = off
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 5302973379..c448ce7373 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 19e21ab491..9ae3bfe4dd 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -301,6 +301,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index f9cfeae264..a5f2aa1a09 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..dd8433f42c
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,167 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER, /* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+} ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_INITIAL,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is being
+								 * committed */
+	FDWXACT_STATUS_ABORTING,	/* foreign prepared transaction is being
+								 * aborted */
+	FDWXACT_STATUS_RESOLVED
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	bool		indoubt;		/* Is an in-doubt transaction? */
+	slock_t		mutex;			/* Protect the above fields */
+
+	/* The status of the foreign transaction, protected by FdwXactLock */
+	FdwXactStatus status;
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void PreCommit_FdwXacts(void);
+extern void FdwXactResolveTransaction(Oid dbid, TransactionId xid, PGPROC *waiter);
+extern bool FdwXactResolveInDoubtTransactions(Oid dbid);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern PGPROC *FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..c3ed1ecfaf
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLauncherRequestToLaunchForRetry(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..80691b5c07
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Stats */
+	TimestampTz last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index c88dccfb8d..254a663b4d 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a04fc70326..6f1f336e31 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 087918d41d..ed4d08b4af 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index eb3c1a88d1..49465a1df2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5211,6 +5211,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '9705', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5924,6 +5931,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6042,6 +6067,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3a65a51696..a59d7c1f60 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -776,6 +776,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,7 +855,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -934,6 +938,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index d21780108b..35ffbbca93 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -152,6 +153,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 454c2df487..6010dbcdee 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 634f8256f7..0f3ff9742e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1342,6 +1342,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1848,6 +1856,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.23.0

#33Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#30)

On Tue, 18 Feb 2020 at 00:40, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada San,

I have a couple of comments on "v27-0002-Support-atomic-commit-among-multiple-foreign-ser.patch"

1- As part of the XLogReadRecord refactoring commit the signature of XLogReadRecord was changed,
so the function call to XLogReadRecord() needs a small adjustment.

i.e. In function XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
...
-       record = XLogReadRecord(xlogreader, lsn, &errormsg);
+       XLogBeginRead(xlogreader, lsn)
+       record = XLogReadRecord(xlogreader, &errormsg);

2- In register_fdwxact(..) function you are setting the XACT_FLAGS_FDWNOPREPARE transaction flag
when the register request comes in for foreign server that does not support two-phase commit regardless
of the value of 'bool modified' argument. And later in the PreCommit_FdwXacts() you just error out when
"foreign_twophase_commit" is set to 'required' only by looking at the XACT_FLAGS_FDWNOPREPARE flag.
which I think is not correct.
Since there is a possibility that the transaction might have only read from the foreign servers (not capable of
handling transactions or two-phase commit) and all other servers where we require to do atomic commit
are capable enough of doing so.
If I am not missing something obvious here, then IMHO the XACT_FLAGS_FDWNOPREPARE flag should only
be set when the transaction management/two-phase functionality is not available and "modified" argument is
true in register_fdwxact()

Thank you for reviewing this patch!

Your comments are incorporated in the latest patch set I recently sent[1]/messages/by-id/CA+fd4k5ZcDvoiY_5c-mF1oDACS5nUWS7ppoiOwjCOnM+grJO-Q@mail.gmail.com.

[1]: /messages/by-id/CA+fd4k5ZcDvoiY_5c-mF1oDACS5nUWS7ppoiOwjCOnM+grJO-Q@mail.gmail.com

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#34Muhammad Usama
m.usama@gmail.com
In reply to: Masahiko Sawada (#32)
5 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sat, Feb 22, 2020 at 7:15 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 19 Feb 2020 at 07:55, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 11 Feb 2020 at 12:42, amul sul <sulamul@gmail.com> wrote:

On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada <

masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <

horikyota.ntt@gmail.com> wrote:

Hello.

This is the reased (and a bit fixed) version of the patch. This
applies on the master HEAD and passes all provided tests.

I took over this work from Sawada-san. I'll begin with reviewing the
current patch.

The previous patch set is no longer applied cleanly to the current
HEAD. I've updated and slightly modified the codes.

This patch set has been marked as Waiting on Author for a long time
but the correct status now is Needs Review. The patch actually was
updated and incorporated all review comments but they was not rebased
actively.

The mail[1] I posted before would be helpful to understand the current
patch design and there are README in the patch and a wiki page[2].

I've marked this as Needs Review.

Hi Sawada san,

I just had a quick look to 0001 and 0002 patch here is the few

suggestions.

patch: v27-0001:

Typo: s/non-temprary/non-temporary
----

patch: v27-0002: (Note:The left-hand number is the line number in the

v27-0002 patch):

138 +PostgreSQL's the global transaction manager (GTM), as a

distributed transaction

139 +participant The registered foreign transactions are tracked

until the end of

Full stop "." is missing after "participant"

174 +API Contract With Transaction Management Callback Functions

Can we just say "Transaction Management Callback Functions";
TOBH, I am not sure that I understand this title.

203 +processing foreign transaction (i.g. preparing, committing or

aborting) the

Do you mean "i.e" instead of i.g. ?

269 + * RollbackForeignTransactionAPI. Registered participant servers

are identified

Add space before between RollbackForeignTransaction and API.

292 + * automatically so must be processed manually using by

pg_resovle_fdwxact()

Do you mean pg_resolve_foreign_xact() here?

320 + * the foreign transaction is authorized to update the fields

from its own

321 + * one.
322 +
323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK

PREPARED a

Please add asterisk '*' on line#322.

816 +static void
817 +FdwXactPrepareForeignTransactions(void)
818 +{
819 + ListCell *lcell;

Let's have this variable name as "lc" like elsewhere.

1036 + ereport(ERROR, (errmsg("could not insert a foreign

transaction entry"),

1037 + errdetail("duplicate entry with

transaction id %u, serverid %u, userid %u",

1038 + xid, serverid, userid)));
1039 + }

Incorrect formatting.

1166 +/*
1167 + * Return true and set FdwXactAtomicCommitReady to true if the

current transaction

Do you mean ForeignTwophaseCommitIsRequired instead of

FdwXactAtomicCommitReady?

3529 +
3530 +/*
3531 + * FdwXactLauncherRegister
3532 + * Register a background worker running the foreign

transaction

3533 + * launcher.
3534 + */

This prolog style is not consistent with the other function in the

file.

And here are the few typos:

s/conssitent/consistent
s/consisnts/consist
s/Foriegn/Foreign
s/tranascation/transaction
s/itselft/itself
s/rolbacked/rollbacked
s/trasaction/transaction
s/transactio/transaction
s/automically/automatically
s/CommitForeignTransaciton/CommitForeignTransaction
s/Similary/Similarly
s/FDWACT_/FDWXACT_
s/dink/disk
s/requried/required
s/trasactions/transactions
s/prepread/prepared
s/preapred/prepared
s/beging/being
s/gxact/xact
s/in-dbout/in-doubt
s/respecitively/respectively
s/transction/transaction
s/idenetifier/identifier
s/identifer/identifier
s/checkpoint'S/checkpoint's
s/fo/of
s/transcation/transaction
s/trasanction/transaction
s/non-temprary/non-temporary
s/resovler_internal.h/resolver_internal.h

Thank you for reviewing the patch! I've incorporated all comments in
local branch.

Attached the updated version patch sets that incorporated review
comments from Amul and Muhammad.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi Sawada San,

I have been further reviewing and testing the transaction involving
multiple server patches.
Overall the patches are working as expected bar a few important exceptions.
So as discussed over the call I have fixed the issues I found during the
testing
and also rebased the patches with the current head of the master branch.
So can you please have a look at the attached updated patches.

Below is the list of changes I have made on top of V18 patches.

1- In register_fdwxact(), As we are just storing the callback function
pointers from
FdwRoutine in fdw_part structure, So I think we can avoid calling
GetFdwRoutineByServerId() in TopMemoryContext.
So I have moved the MemoryContextSwitch to TopMemoryContext after the
GetFdwRoutineByServerId() call.

2- If PrepareForeignTransaction functionality is not present in some FDW
then
during the registration process we should only set the
XACT_FLAGS_FDWNOPREPARE
transaction flag if the modified flag is also set for that server. As for
the server that has
not done any data modification within the transaction we do not do
two-phase commit anyway.

3- I have moved the foreign_twophase_commit in sample file after
max_foreign_transaction_resolvers because the default value of
max_foreign_transaction_resolvers
is 0 and enabling the foreign_twophase_commit produces an error with default
configuration parameter positioning in postgresql.conf
Also, foreign_twophase_commit configuration was missing the comments
about allowed values in the sample config file.

4- Setting ForeignTwophaseCommitIsRequired in
is_foreign_twophase_commit_required()
function does not seem to be the correct place. The reason being, even when
*is_foreign_twophase_commit_required() *returns true after setting
ForeignTwophaseCommitIsRequired
to true, we could still end up not using the two-phase commit in the case
when some server does
not support two-phase commit and foreign_twophase_commit is set to
FOREIGN_TWOPHASE_COMMIT_PREFER
mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to
PreCommit_FdwXacts()
function after doing the prepare transaction.

6- In prefer mode, we commit the transaction in single-phase if the server
does not support
the two-phase commit. But instead of doing the single-phase commit right
away,
IMHO the better way is to wait until all the two-phase transactions are
successfully prepared
on servers that support the two-phase. Since an error during a "PREPARE"
stage would
rollback the transaction and in that case, we would end up with committed
transactions on
the server that lacks the support of the two-phase commit.
So I have modified the flow a little bit and instead of doing a one-phase
commit right away
the servers that do not support a two-phase commit is added to another list
and that list is
processed after once we have successfully prepared all the transactions on
two-phase supported
foreign servers. Although this technique is also not bulletproof, still it
is better than doing
the one-phase commits before doing the PREPAREs.

Also, I think we can improve on this one by throwing an error even in PREFER
mode if there is more than one server that had data modified within the
transaction
and lacks the two-phase commit support.

7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim
the
memory if fdw_part is removed from the list

8- The function FdwXactWaitToBeResolved() was bailing out as soon as it
finds
(FdwXactParticipants == NIL). The problem with that was in the case of
"COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
effectively the foreign prepared transactions(if any) associated with
locally
prepared transactions were never getting resolved automatically.

postgres=# BEGIN;
BEGIN
INSERT INTO test_local VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'local_prepared';
PREPARE TRANSACTION

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier

-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f |
fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f |
fx_1963224020_515_16391_10
(2 rows)

-- Now commit the prepared transaction

postgres=# COMMIT PREPARED 'local_prepared';

COMMIT PREPARED

--Foreign prepared transactions associated with 'local_prepared' not
resolved

postgres=#
postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier

-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f |
fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f |
fx_1963224020_515_16391_10
(2 rows)

So to fix this in case of the two-phase transaction, the function checks
the existence
of associated foreign prepared transactions before bailing out.

9- In function XlogReadFdwXactData() XLogBeginRead call was missing before
XLogReadRecord()
that was causing the crash during recovery.

10- incorporated set_ps_display() signature change.

Best regards,

...
Muhammad Usama
HighGo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca

Attachments:

v19-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v19-0005-Add-regression-tests-for-atomic-commit.patchDownload
From 0af83b7a06d6374b3c2e72afc7e460fa12fc314e Mon Sep 17 00:00:00 2001
From: Muhammad Usama <m.usama@highgo.ca>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v19 5/5] Add regression tests for atomic commit.

Authors: Muhammad Usama, Masahiko Sawada, Ahutosh Bapat
---
 src/test/recovery/Makefile         |   2 +-
 src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++
 src/test/regress/pg_regress.c      |  13 ++-
 3 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/016_fdwxact.pl

diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/016_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f6a5e1b9c7..9f62e750b5 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2320,9 +2320,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2337,7 +2340,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.21.1 (Apple Git-122.3)

v19-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v19-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 92823fb0fe93eed7c91bf2628fcd3fed8694e69d Mon Sep 17 00:00:00 2001
From: Muhammad Usama <m.usama@highgo.ca>
Date: Thu, 26 Mar 2020 21:22:17 +0500
Subject: [PATCH v19 2/5] Support atomic commit among multiple foreign servers.

Authors: Muhammad Usama, Masahiko Sawada, Ahutosh Bapat
---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  130 +
 src/backend/access/fdwxact/fdwxact.c          | 2888 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  641 ++++
 src/backend/access/fdwxact/resolver.c         |  343 ++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   42 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |   11 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |    6 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  167 +
 src/include/access/fdwxact_launcher.h         |   29 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   66 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   29 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |   13 +
 55 files changed, 4972 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..c20570022c
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,130 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant. The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for either PREPARE, COMMIT or
+ROLLBACK the transaction on the foreign server respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING
+before the foreign transaction is prepared, committed and aborted by FDW
+callback functions respectively(*1). And the status then changes to
+FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then
+the corresponding FdwXact entry is removed with WAL logging. If failed during
+processing foreign transaction (e.g., preparing, committing or aborting) the
+status changes back to the previous status. Therefore the status
+FDWXACT_STATUS_xxxING appear only during the foreign transaction is being
+processed by an FDW callback function.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+    |           |       INITIAL       |            |
+    |           +---------------------+            |
+   (*2)                    |                      (*2)
+    |                      v                       |
+    |           +---------------------+            |
+    |           |    PREPARING(*1)    |            |
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                      RESOLVED                      |
+ +----------------------------------------------------+
+
+(*1) Status that appear only during being processed by FDW
+(*2) Paths for recovered FdwXact entries
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..0990a4e3ed
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2888 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit. The foreign servers are
+ * registered if FDW has both CommitForeignTransaction API and
+ * RollbackForeignTransaction API. Registered participant servers are identified
+ * by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * foreign server everywhere. And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask it to commit, we also tell it to notify us when
+ * it's done, so that we can wait interruptibly for it to finish, and so
+ * that we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request. We have one queue
+ * of which elements are ordered by the timestamp that they expect to be
+ * processed at. Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp that they expects to be processed.
+ * Similarly if failed to resolve them, it enqueues again with new timestamp
+ * (its timestamp + foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction). Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by
+ * pg_resovle_foreign_xact() function.
+ *
+ * Two-phase commit protocol is required if the transaction modified two or
+ * more servers including itself. In other case, all foreign transactions are
+ * committed or rolled back during pre-commit.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed by FDW, the corresponding
+ * FdwXact entry is update. In order to protect the entry from concurrent
+ * removing we need to hold a lock on the entry or a lock for entire global
+ * array. However, we don't want to hold the lock during FDW is processing the
+ * foreign transaction that may take a unpredictable time. To avoid this, the
+ * in-memory data of foreign transaction follows a locking model based on
+ * four linked concepts:
+ *
+ * * A foreign transaction's status variable is switched using the LWLock
+ *   FdwXactLock, which need to be hold in exclusive mode when updating the
+ *   status, while readers need to hold it in shared mode when looking at the
+ *   status.
+ * * A process who is going to update FdwXact entry cannot process foreign
+ *   transaction that is being resolved.
+ * * So setting the status to FDWXACT_STATUS_PREPARING,
+ *   FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign
+ *   transaction in-progress states, means to own the FdwXact entry, which
+ *   protect it from updating/removing by concurrent writers.
+ * * Individual fields are protected by mutex where only the backend owning
+ *   the foreign transaction is authorized to update the fields from its own
+ *   one.
+ *
+ * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
+ * process who is going to call transaction callback functions needs to change
+ * the status to the corresponding status above while holding FdwXactLock in
+ * exclusive mode, and call callback function after releasing the lock.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon. We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Atomic commit is enabled by configuration */
+#define IsForeignTwophaseCommitEnabled() \
+	(max_prepared_foreign_xacts > 0 && \
+	 max_foreign_xact_resolvers > 0)
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	(IsForeignTwophaseCommitEnabled() && \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED))
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define IsSeverCapableOfTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Check the FdwXact is begin resolved */
+#define FdwXactIsBeingResolved(fx) \
+	(((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \
+	 ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING))
+
+/*
+ * Structure to bundle the foreign transaction participant. This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transaction participants for atomic commit. This list
+ * has only foreign servers that provides transaction management callbacks,
+ * that is CommitForeignTransaction and RollbackForeignTransaction.
+ */
+static List *FdwXactParticipants = NIL;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+												 bool for_commit);
+static void FdwXactResolveForeignTransaction(FdwXact fdwxact,
+											 FdwXactRslvState *state,
+											 FdwXactStatus fallback_status);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock);
+static bool is_foreign_twophase_commit_required(bool *local_modified);
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						  bool including_indoubts, bool include_in_progress,
+						  bool need_lock);
+static FdwXact get_all_fdwxacts(int *num_p);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid);
+static FdwXactRslvState *create_fdwxact_state(void);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation	rel;
+	Oid			serverid;
+	Oid			userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction.
+ *
+ * The foreign transaction identified by given server id and user id.
+ * Registered foreign transactions are managed by the global transaction
+ * manager until the end of the transaction.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Don't register foreign server if it doesn't provide both commit and
+	 * rollback transaction management callbacks and is modified.
+	 */
+	if (!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction &&
+		modified)
+	{
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+		pfree(routine);
+		return;
+	}
+
+	/*
+	 * Remember we modified on the foreign server that is not capable of two-phase
+	 * commit.
+	 */
+	if (!routine->PrepareForeignTransaction &&
+		modified)
+		MyXactFlags |= XACT_FLAGS_FDWNOPREPARE;
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact = NULL;
+	fdw_part->modified = modified;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&(fdwxacts[cnt].mutex));
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign server's FDWs to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and for 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	bool		need_twophase_commit;
+	bool		local_modified;
+	ListCell   *lc = NULL;
+	List       *non_twophase_participants = NIL;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false if
+	 * foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = is_foreign_twophase_commit_required(&local_modified);
+
+	/*
+	 * If foreign two phase commit is required then all foreign serves
+	 * must be be capable of doing two-phase commit
+	 */
+
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0 &&
+		need_twophase_commit)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot COMMIT a distributed transaction that has operated on a foreign server that doesn't support atomic commit")));
+
+	/* Attempt to commit foreign transactions in one-phase */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		bool		commit = false;
+		if (!need_twophase_commit)
+		{
+			/* Can commit in one-phase if two-phase commit is not required */
+			commit = true;
+		}
+		else if (!fdw_part->modified)
+		{
+			/*
+			 * Non-modified foreign transaction always can be committed in
+			 * one-phase regardless of two-phase commit support.
+			 */
+			commit = true;
+		}
+		else if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER &&
+				 !IsSeverCapableOfTwophaseCommit(fdw_part))
+		{
+			/*
+			 * In 'prefer' mode, transactions for the servers lacking the
+			 * capability of a two-phase commit can be committed using
+			 * single-phase, but instead of doing the commit right away,
+			 * we wait until all prepared transaction gets prepared successfully
+			 * on two-phase transaction capable servers.So just add this server
+			 * to the list and process that after preparing foreign transactions
+			 */
+			non_twophase_participants = lappend(non_twophase_participants, fdw_part);
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+		}
+
+		if (commit)
+		{
+			/* Commit the foreign transaction in one-phase */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+
+			/* Delete it from the participant list */
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+			pfree(fdw_part);
+		}
+	}
+
+	/* All done if we have already committed all foreign transactions */
+	if (FdwXactParticipants == NIL && non_twophase_participants == NIL)
+		return;
+
+	Assert(foreign_twophase_commit != FOREIGN_TWOPHASE_COMMIT_DISABLED);
+
+	/*
+	 * If we now have only one two-phase capable server left in the list
+	 * no server in non_twophase_participants list
+	 * we didn't modify the local
+	 * data, then we don't need the two phase commit.
+	 */
+	if (non_twophase_participants == NIL &&
+		list_length(FdwXactParticipants) == 1 &&
+		!local_modified)
+	{
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants),
+											 true);
+
+		list_free_deep(FdwXactParticipants);
+		FdwXactParticipants = NIL;
+		return;
+	}
+
+	/*
+	 * Finally, prepare foreign transactions. Note that we keep
+	 * FdwXactParticipants until the end of transaction.
+	 */
+	if (FdwXactParticipants)
+	{
+		FdwXactPrepareForeignTransactions();
+
+		/*
+		 * set ForeignTwophaseCommitIsRequired, if we have prepared the
+		 * transactions on the foreign servers
+		 */
+		ForeignTwophaseCommitIsRequired = true;
+	}
+
+	if (non_twophase_participants == NIL)
+		return;
+	/*
+	 * commit the transactions on servers lacking two-phase capability.
+	 */
+	foreach(lc, non_twophase_participants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		/* Commit the foreign transaction in one-phase */
+		FdwXactOnePhaseEndForeignTransaction(fdw_part, true);
+	}
+	list_free_deep(non_twophase_participants);
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here. If any error occurs, we rollback
+ * non-prepared foreign transactions and leave others to the resolver.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Parameter check */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	xid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState *state;
+		FdwXact		fdwxact;
+
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		state = create_fdwxact_state();
+		state->server = fdw_part->server;
+		state->usermapping = fdw_part->usermapping;
+		state->fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+
+		/* Update the status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		Assert(fdwxact->status == FDWXACT_STATUS_INITIAL);
+		fdwxact->status = FDWXACT_STATUS_PREPARING;
+		LWLockRelease(FdwXactLock);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 *
+		 * During abort processing, we might try to resolve a never-prepared
+		 * transaction, and get an error. This is fine as long as the FDW
+		 * provides us unique prepared transaction identifiers.
+		 */
+		PG_TRY();
+		{
+			fdw_part->prepare_foreign_xact_fn(state);
+		}
+		PG_CATCH();
+		{
+			/* failed, back to the initial state */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->status = FDWXACT_STATUS_INITIAL;
+			LWLockRelease(FdwXactLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		/* succeeded, update status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * One-phase commit or rollback the given foreign transaction participant.
+ */
+static void
+FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part,
+									 bool for_commit)
+{
+	FdwXactRslvState *state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state = create_fdwxact_state();
+	state->server = fdw_part->server;
+	state->usermapping = fdw_part->usermapping;
+	state->flags = FDWXACT_FLAG_ONEPHASE;
+
+	/*
+	 * Commit or rollback foreign transaction in one-phase. Since we didn't
+	 * insert FdwXact entry for this transaction we don't need to care
+	 * failures. On failure we change to rollback.
+	 */
+	if (for_commit)
+		fdw_part->commit_foreign_xact_fn(state);
+	else
+		fdw_part->rollback_foreign_xact_fn(state);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_INITIAL;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyPgXact->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyPgXact->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	int			i;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->dbid = dbid;
+	fdwxact->local_xid = xid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	fdwxact->indoubt = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (FdwXactIsBeingResolved(fdwxact))
+		elog(ERROR, "cannot remove fdwxact entry that is being resolved");
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyPgXact->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyPgXact->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ * Also set *local_modified to true if the transaction modified the local data.
+ */
+static bool
+is_foreign_twophase_commit_required(bool *local_modified)
+{
+	ListCell   *lc;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->modified)
+			nserverswritten++;
+	}
+
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+	{
+		++nserverswritten;
+		if (local_modified)
+			*local_modified = true;
+	}
+
+	/*
+	 * Atomic commit is required if we modified data on two or more
+	 * participants.
+	 */
+	if (nserverswritten <= 1)
+		return false;
+
+	return true;
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	int			i;
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell   *cell;
+	int			n_lefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdw_part->fdwxact)
+			continue;
+
+		/*
+		 * There is a race condition; the FdwXact entries in
+		 * FdwXactParticipants could be used by other backend before we forget
+		 * in case where the resolver process removes the FdwXact entry and
+		 * other backend reuses it before we forget. So we need to check if
+		 * the entries are still associated with the transaction.
+		 */
+		SpinLockAcquire(&fdwxact->mutex);
+		if (fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;
+			n_lefts++;
+		}
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (n_lefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts);
+		FdwXactComputeRequiredXmin();
+	}
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for the foreign transaction to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit)
+{
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if atomic commit is not requested */
+	if (!IsForeignTwophaseCommitRequested())
+		return;
+
+	/*
+	 * exit if the transaction itself has no foreign transaction
+	 * participants
+	 */
+	if (FdwXactParticipants == NIL)
+	{
+		/*
+		 * If we are here because of COMMIT/ROLLBACK PREPARED then the
+		 * FdwXactParticipants list would be empty. So we need to
+		 * see if there are any foreign prepared transactions exists
+		 * for this prepared transaction
+		 */
+		if (TwoPhaseExists(wait_xid))
+		{
+			List *foreign_trans = NIL;
+
+			foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid,
+					 false, false, true);
+
+			if (foreign_trans == NIL)
+				return;
+			list_free(foreign_trans);
+		}
+	}
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * AtEOXact_FdwXacts
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+
+			/*
+			 * If the foreign transaction has FdwXact entry we might have
+			 * prepared it. Skip already-prepared foreign transaction because
+			 * it has closed its transaction. But we are not sure that foreign
+			 * transaction with status == FDWXACT_STATUS_PREPARING has been
+			 * prepared or not. So we call the rollback API to close its
+			 * transaction for safety. The prepared foreign transaction that
+			 * we might have will be resolved by the foreign transaction
+			 * resolver.
+			 */
+			if (fdw_part->fdwxact)
+			{
+				bool		is_prepared;
+
+				LWLockAcquire(FdwXactLock, LW_SHARED);
+				is_prepared = fdw_part->fdwxact &&
+					fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED;
+				LWLockRelease(FdwXactLock);
+
+				if (is_prepared)
+					continue;
+			}
+
+			/* One-phase rollback foreign transaction */
+			FdwXactOnePhaseEndForeignTransaction(fdw_part, false);
+		}
+	}
+
+	/*
+	 * In commit cases, we have already prepared foreign transactions during
+	 * pre-commit phase. And these prepared transactions will be resolved by
+	 * the resolver process.
+	 */
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Prepare foreign transactions.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * We cannot prepare if any foreign server of participants isn't capable
+	 * of two-phase commit.
+	 */
+	if (is_foreign_twophase_commit_required(NULL) &&
+		(MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not prepare the transaction")));
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+			break;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+	{
+		*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+		*waitXid_p = proc->fdwXactWaitXid;
+	}
+	else
+	{
+		*nextResolutionTs_p = -1;
+		*waitXid_p = InvalidTransactionId;
+	}
+
+	LWLockRelease(FdwXactResolutionLock);
+
+	return proc;
+}
+
+/*
+ * Get one FdwXact entry to resolve. This function intended to be used when
+ * a resolver process get FdwXact entries to resolve. So we search entries
+ * while not including in-doubt transactions and in-progress transactions.
+ */
+static FdwXact
+get_fdwxact_to_resolve(Oid dbid, TransactionId xid)
+{
+	List	   *fdwxacts = NIL;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Don't include both in-doubt transactions and in-progress transactions */
+	fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid,
+							false, false, false);
+
+	return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts);
+}
+
+/*
+ * Resolve one distributed transaction on the given database . The target
+ * distributed transaction is fetched from the waiting queue and its transaction
+ * participants are fetched from the global array.
+ *
+ * Release the waiter and return true after we resolved the all of the foreign
+ * transaction participants. On failure, we re-enqueue the waiting backend after
+ * incremented the next resolution time.
+ */
+void
+FdwXactResolveTransaction(Oid dbid, TransactionId xid, PGPROC *waiter)
+{
+	FdwXact		fdwxact;
+
+	Assert(TransactionIdIsValid(xid));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	{
+		FdwXactRslvState *state;
+		ForeignServer *server;
+		UserMapping *usermapping;
+
+		CHECK_FOR_INTERRUPTS();
+
+		server = GetForeignServer(fdwxact->serverid);
+		usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+
+		state = create_fdwxact_state();
+		SpinLockAcquire(&fdwxact->mutex);
+		state->server = server;
+		state->usermapping = usermapping;
+		state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+		SpinLockRelease(&fdwxact->mutex);
+
+		FdwXactDetermineTransactionFate(fdwxact, false);
+
+		/* Do not hold during foreign transaction resolution */
+		LWLockRelease(FdwXactLock);
+
+		PG_TRY();
+		{
+			/*
+			 * Resolve the foreign transaction. When committing or aborting
+			 * prepared foreign transactions the previous status is always
+			 * FDWXACT_STATUS_PREPARED.
+			 */
+			FdwXactResolveForeignTransaction(fdwxact, state,
+											 FDWXACT_STATUS_PREPARED);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+			if (waiter->fdwXactState == FDWXACT_WAITING)
+			{
+				SHMQueueDelete(&(waiter->fdwXactLinks));
+				pg_write_barrier();
+				waiter->fdwXactNextResolutionTs =
+					TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+												foreign_xact_resolution_retry_interval);
+				FdwXactQueueInsert(waiter);
+			}
+			LWLockRelease(FdwXactResolutionLock);
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d",
+			 fdwxact->local_xid, fdwxact->serverid, fdwxact->userid);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter could
+	 * already be detached if user cancelled to wait before resolution.
+	 */
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Determine whether the given foreign transaction should be committed or
+ * rolled back according to the result of the local transaction. This function
+ * changes fdwxact->status so the caller must hold FdwXactLock in exclusive
+ * mode or passing need_lock with true.
+ */
+static void
+FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock)
+{
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/*
+	 * The being resolved transaction must be either that has been cancelled
+	 * and marked as in-doubt or that has been prepared.
+	 */
+	Assert(fdwxact->indoubt ||
+		   fdwxact->status == FDWXACT_STATUS_PREPARED);
+
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(fdwxact->local_xid))
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted. This should not happen except for one case
+	 * where the local transaction is prepared and this foreign transaction is
+	 * being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u",
+						fdwxact->local_xid, fdwxact->serverid),
+				 errhint("The local transaction with xid %u might be prepared",
+						 fdwxact->local_xid)));
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Resolve the foreign transaction using the foreign data wrapper's transaction
+ * callback function. The 'state' is passed to the callback function. The fate of
+ * foreign transaction must be determined. If foreign transaction is resolved
+ * successfully, remove the FdwXact entry from the shared memory and also
+ * remove the corresponding on-disk file. If failed, the status of FdwXact
+ * entry changes to 'fallback_status' before erroring out.
+ */
+static void
+FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state,
+								 FdwXactStatus fallback_status)
+{
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *fdw_routine;
+	bool		is_commit;
+
+	Assert(state != NULL);
+	Assert(state->server && state->usermapping && state->fdwxact_id);
+	Assert(fdwxact != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+		elog(ERROR, "cannot resolve foreign transaction whose fate is not determined");
+
+	is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING;
+	LWLockRelease(FdwXactLock);
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	fdw_routine = GetFdwRoutine(fdw->fdwhandler);
+
+	PG_TRY();
+	{
+		if (is_commit)
+			fdw_routine->CommitForeignTransaction(state);
+		else
+			fdw_routine->RollbackForeignTransaction(state);
+	}
+	PG_CATCH();
+	{
+		/* Back to the fallback status */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->status = fallback_status;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	/* Resolution was a success, remove the entry */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u",
+		 is_commit ? "committed" : "rolled back",
+		 fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid,
+		 fdwxact->userid);
+
+	fdwxact->status = FDWXACT_STATUS_RESOLVED;
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Return palloc'd and initialized FdwXactRslvState.
+ */
+static FdwXactRslvState *
+create_fdwxact_state(void)
+{
+	FdwXactRslvState *state;
+
+	state = palloc(sizeof(FdwXactRslvState));
+	state->server = NULL;
+	state->usermapping = NULL;
+	state->fdwxact_id = NULL;
+	state->flags = 0;
+
+	return state;
+}
+
+/*
+ * Return at least one FdwXact entry that matches to given argument,
+ * otherwise return NULL. All arguments must be valid values so that it can
+ * search exactly one (or none) entry. Note that this function intended to be
+ * used for modifying the returned FdwXact entry, so the caller must hold
+ * FdwXactLock in exclusive mode and it doesn't include the in-progress
+ * FdwXact entries.
+ */
+static FdwXact
+get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	List	   *fdwxact_list;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* All search conditions must be valid values */
+	Assert(TransactionIdIsValid(xid));
+	Assert(OidIsValid(serverid));
+	Assert(OidIsValid(userid));
+	Assert(OidIsValid(dbid));
+
+	/* Include in-doubt transactions but don't include in-progress ones */
+	fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid,
+								true, false, false);
+
+	/* Must be one entry since we search it by the unique key */
+	Assert(list_length(fdwxact_list) <= 1);
+
+	/* Could not find entry */
+	if (fdwxact_list == NIL)
+		return NULL;
+
+	return (FdwXact) linitial(fdwxact_list);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	List	   *fdwxact_list;
+	bool		ret = false;
+
+	/* Find entries from all FdwXact entries */
+	fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid,
+								userid, true, true, true);
+
+	if (fdwxact_list != NIL)
+		ret = true;
+
+	list_free(fdwxact_list);
+	return ret;
+}
+
+/*
+ * Returns an array of all foreign prepared transactions for the user-level
+ * function pg_foreign_xacts, and the number of entries to num_p.
+ *
+ * WARNING -- we return even those transactions whose information is not
+ * completely filled yet. The caller should filter them out if he doesn't
+ * want them.
+ *
+ * The returned array is palloc'd.
+ */
+static FdwXact
+get_all_fdwxacts(int *num_p)
+{
+	List	   *all_fdwxacts;
+	ListCell   *lc;
+	FdwXact		fdwxacts;
+	int			num_fdwxacts = 0;
+
+	Assert(num_p != NULL);
+
+	/* Get all entries */
+	all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId,
+								InvalidOid, InvalidOid, true,
+								true, true);
+
+	if (all_fdwxacts == NIL)
+	{
+		*num_p = 0;
+		return NULL;
+	}
+
+	fdwxacts = (FdwXact)
+		palloc(sizeof(FdwXactData) * list_length(all_fdwxacts));
+	*num_p = list_length(all_fdwxacts);
+
+	/* Convert list to array of FdwXact */
+	foreach(lc, all_fdwxacts)
+	{
+		FdwXact		fx = (FdwXact) lfirst(lc);
+
+		memcpy(fdwxacts + num_fdwxacts, fx,
+			   sizeof(FdwXactData));
+		num_fdwxacts++;
+	}
+
+	list_free(all_fdwxacts);
+
+	return fdwxacts;
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respectively.
+ */
+static List *
+get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			 bool include_indoubt, bool include_in_progress, bool need_lock)
+{
+	int			i;
+	List	   *fdwxact_list = NIL;
+
+	if (need_lock)
+		LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* include in-doubt transaction? */
+		if (!include_indoubt && fdwxact->indoubt)
+			continue;
+
+		/* include in-progress transaction? */
+		if (!include_in_progress && FdwXactIsBeingResolved(fdwxact))
+			continue;
+
+		/* Append it if matched */
+		fdwxact_list = lappend(fdwxact_list, fdwxact);
+	}
+
+	if (need_lock)
+		LWLockRelease(FdwXactLock);
+
+	return fdwxact_list;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		/*
+		 * FDW doesn't provide the callback function, generate an unique
+		 * identifier.
+		 */
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									&read_local_xlog_page, NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+	int			i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know the xact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	fdwxact = get_one_fdwxact(dbid, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		return;
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	int			i;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+check_foreign_twophase_commit(int *newval, void **extra, GucSource source)
+{
+	ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval;
+
+	/* Parameter check */
+	if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED &&
+		(max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0))
+	{
+		GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when "
+							"\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\""
+							"is zero value");
+		return false;
+	}
+
+	return true;
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+} WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	FuncCallContext *funcctx;
+	WorkingStatus *status;
+	char	   *xact_status;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		TupleDesc	tupdesc;
+		MemoryContext oldcontext;
+		int			num_fdwxacts = 0;
+
+		/* create a function context for cross-call persistence */
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		/*
+		 * Switch to memory context appropriate for multiple function calls
+		 */
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		/* build tupdesc for result tuples */
+		/* this had better match pg_fdwxacts view in system_views.sql */
+		tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction",
+						   XIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status",
+						   TEXTOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt",
+						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier",
+						   TEXTOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+
+		/*
+		 * Collect status information that we will format and send out as a
+		 * result set.
+		 */
+		status = (WorkingStatus *) palloc(sizeof(WorkingStatus));
+		funcctx->user_fctx = (void *) status;
+
+		status->fdwxacts = get_all_fdwxacts(&num_fdwxacts);
+		status->num_xacts = num_fdwxacts;
+		status->cur_xact = 0;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	status = funcctx->user_fctx;
+
+	while (status->cur_xact < status->num_xacts)
+	{
+		FdwXact		fdwxact = &status->fdwxacts[status->cur_xact++];
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+		HeapTuple	tuple;
+		Datum		result;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(values, 0, sizeof(values));
+		MemSet(nulls, 0, sizeof(nulls));
+
+		values[0] = ObjectIdGetDatum(fdwxact->dbid);
+		values[1] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[2] = ObjectIdGetDatum(fdwxact->serverid);
+		values[3] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (fdwxact->status)
+		{
+			case FDWXACT_STATUS_INITIAL:
+				xact_status = "initial";
+				break;
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			case FDWXACT_STATUS_RESOLVED:
+				xact_status = "resolved";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = BoolGetDatum(fdwxact->indoubt);
+		values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		result = HeapTupleGetDatum(tuple);
+		SRF_RETURN_NEXT(funcctx, result);
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Built-in function to resolve a prepared foreign transaction manually.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	ForeignServer *server;
+	UserMapping *usermapping;
+	FdwXact		fdwxact;
+	FdwXactRslvState *state;
+	FdwXactStatus prev_status;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	server = GetForeignServer(serverid);
+	usermapping = GetUserMapping(userid, serverid);
+	state = create_fdwxact_state();
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+	{
+		LWLockRelease(FdwXactLock);
+		PG_RETURN_BOOL(false);
+	}
+
+	state->server = server;
+	state->usermapping = usermapping;
+	state->fdwxact_id = pstrdup(fdwxact->fdwxact_id);
+
+	SpinLockAcquire(&fdwxact->mutex);
+	prev_status = fdwxact->status;
+	SpinLockRelease(&fdwxact->mutex);
+
+	FdwXactDetermineTransactionFate(fdwxact, false);
+
+	ereport(LOG,
+			(errmsg("trying to %s the foreign transaction associated with transaction %u on server %u",
+					fdwxact->status == FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
+					fdwxact->local_xid, fdwxact->serverid)));
+
+	LWLockRelease(FdwXactLock);
+
+	FdwXactResolveForeignTransaction(fdwxact, state, prev_status);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (fdwxact == NULL)
+		PG_RETURN_BOOL(false);
+
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..e293d13562
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,641 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to retry resolution.
+ */
+void
+FdwXactLauncherRequestToLaunchForRetry(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		SetLatch(FdwXactRslvCtl->launcher_latch);
+}
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			resolver->pid = InvalidPid;
+			resolver->dbid = InvalidOid;
+			resolver->in_use = false;
+			resolver->last_resolved_time = 0;
+			resolver->latch = NULL;
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == 0);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+	int			i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+	int			i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->indoubt)
+			continue;
+
+		hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+		return false;
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Returns activity of all foreign transaction resolvers.
+ */
+Datum
+pg_stat_get_foreign_xact(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	int			i;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not " \
+						"allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+		pid_t		pid;
+		Oid			dbid;
+		TimestampTz last_resolved_time;
+		Datum		values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+		bool		nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS];
+
+
+		SpinLockAcquire(&(resolver->mutex));
+		if (resolver->pid == InvalidPid)
+		{
+			SpinLockRelease(&(resolver->mutex));
+			continue;
+		}
+
+		pid = resolver->pid;
+		dbid = resolver->dbid;
+		last_resolved_time = resolver->last_resolved_time;
+		SpinLockRelease(&(resolver->mutex));
+
+		memset(nulls, 0, sizeof(nulls));
+		/* pid */
+		values[0] = Int32GetDatum(pid);
+
+		/* dbid */
+		values[1] = ObjectIdGetDatum(dbid);
+
+		/* last_resolved_time */
+		if (last_resolved_time == 0)
+			nulls[2] = true;
+		else
+			values[2] = TimestampTzGetDatum(last_resolved_time);
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..4843aeacc9
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,343 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int			foreign_xact_resolution_retry_interval;
+int			foreign_xact_resolver_timeout = 60 * 1000;
+bool		foreign_xact_resolve_indoubt_xacts;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+	MyFdwXactResolver->last_resolved_time = 0;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		PGPROC	   *waiter = NULL;
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		int			rc;
+		TimestampTz now;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or got the waiter
+		 * that has future resolution time.
+		 */
+		while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+			Assert(TransactionIdIsValid(waitXid));
+
+			if (resolutionTs > now)
+				break;
+
+			elog(DEBUG2, "resolver got one waiter with xid %u", waitXid);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveTransaction(MyDatabaseId, waitXid, waiter);
+			CommitTransactionCommand();
+
+			/* Update my stats */
+			SpinLockAcquire(&(MyFdwXactResolver->mutex));
+			MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp();
+			SpinLockRelease(&(MyFdwXactResolver->mutex));
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz last_resolved_time;
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	last_resolved_time = MyFdwXactResolver->last_resolved_time;
+	timeout = TimestampTzPlusMilliseconds(last_resolved_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * checked FdwXactWaiterExists.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 5adf956f41..e8e6a5e2b5 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2263,6 +2293,12 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, true);
 }
 
 /*
@@ -2322,6 +2358,12 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	FdwXactWaitToBeResolved(xid, false);
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e3c60f23cd..405271387d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1218,6 +1219,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1226,6 +1228,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1264,12 +1267,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1427,6 +1431,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitToBeResolved(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2086,6 +2098,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2246,6 +2261,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2333,6 +2349,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2527,6 +2545,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2732,6 +2751,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7621fc05e2..db116ff7ca 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4545,6 +4546,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6226,6 +6228,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6768,14 +6773,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -6967,7 +6973,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7480,6 +7489,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7810,6 +7820,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9086,6 +9099,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9519,8 +9533,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9538,6 +9554,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9554,6 +9571,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9759,6 +9777,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -9958,6 +9977,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5a6dc61630..246c3df966 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
@@ -826,6 +829,14 @@ CREATE VIEW pg_stat_subscription AS
             LEFT JOIN pg_stat_get_subscription(NULL) st
                       ON (st.subid = su.oid);
 
+CREATE VIEW pg_stat_foreign_xact AS
+    SELECT
+            r.pid,
+            r.dbid,
+            r.last_resolved_time
+    FROM pg_stat_get_foreign_xact() r
+    WHERE r.pid IS NOT NULL;
+
 CREATE VIEW pg_stat_ssl AS
     SELECT
             S.pid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fbde9f88e7..acc9a86642 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2859,8 +2859,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index f197869752..6206265424 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1419,6 +1433,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1572,6 +1595,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..3fa8bfe09f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -939,7 +940,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..29f376e48c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 870a7428f1..d82d32ecb4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -2411,6 +2413,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 6c684b5e12..39c1b08699 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4763c24be9..6ad744db80 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3593,6 +3593,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3799,6 +3805,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4023,6 +4034,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 73d278f3b2..0c3f999e0e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -973,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..9d34817f39 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f45a619deb..6ff1a6758c 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -94,6 +94,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -249,6 +251,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1313,6 +1316,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1378,6 +1382,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1427,6 +1432,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3129,6 +3143,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..adb276370c 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 9938cddb57..71e74e7448 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cb8c23e4b7..a4472fcb60 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3034,6 +3036,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index af876d1f01..3e1c505e28 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -425,6 +426,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -761,6 +781,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2469,6 +2495,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4531,6 +4603,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		check_foreign_twophase_commit, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index aa44f0c9bf..c6a302e49a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -342,6 +344,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions
+					# disabled prefer or required
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index a6577486ce..f520a7a235 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e73639df74..3041c39bc0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 233441837f..b040202043 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 100644
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..dd8433f42c
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,167 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER, /* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+} ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_INITIAL,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is being
+								 * committed */
+	FDWXACT_STATUS_ABORTING,	/* foreign prepared transaction is being
+								 * aborted */
+	FDWXACT_STATUS_RESOLVED
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	bool		indoubt;		/* Is an in-doubt transaction? */
+	slock_t		mutex;			/* Protect the above fields */
+
+	/* The status of the foreign transaction, protected by FdwXactLock */
+	FdwXactStatus status;
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void PreCommit_FdwXacts(void);
+extern void FdwXactResolveTransaction(Oid dbid, TransactionId xid, PGPROC *waiter);
+extern bool FdwXactResolveInDoubtTransactions(Oid dbid);
+extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit);
+extern PGPROC *FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwTwoPhaseNeeded(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified);
+extern bool check_foreign_twophase_commit(int *newval, void **extra,
+										  GucSource source);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..c3ed1ecfaf
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLauncherRequestToLaunchForRetry(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..80691b5c07
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Stats */
+	TimestampTz last_resolved_time;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..177b236f70 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a04fc70326..6f1f336e31 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 27ded593ab..f15a802e5c 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87d25d4a4b..33c6df1375 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5214,6 +5214,13 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o}',
   proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
   prosrc => 'pg_stat_get_subscription' },
+{ oid => '9705', descr => 'statistics: information about foreign transaction resolver',
+  proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,oid,timestamptz}',
+  proargmodes => '{o,o,o}',
+  proargnames => '{pid,dbid,last_resolved_time}',
+  prosrc => 'pg_stat_get_foreign_xact' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' },
@@ -5927,6 +5934,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{oid,xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6045,6 +6070,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a07012bf4b..48602acd7b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -758,6 +758,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -836,7 +838,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -919,6 +923,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index d21780108b..35ffbbca93 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -152,6 +153,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 454c2df487..6010dbcdee 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a2077bbad4..3cc765a496 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1342,6 +1342,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.dbid,
+    f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
@@ -1848,6 +1856,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_foreign_xact| SELECT r.pid,
+    r.dbid,
+    r.last_resolved_time
+   FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time)
+  WHERE (r.pid IS NOT NULL);
 pg_stat_gssapi| SELECT s.pid,
     s.gss_auth AS gss_authenticated,
     s.gss_princ AS principal,
-- 
2.21.1 (Apple Git-122.3)

v19-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v19-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 06c686d4fec361666751fb0cf9f047c26c0ef1d8 Mon Sep 17 00:00:00 2001
From: Muhammad Usama <m.usama@highgo.ca>
Date: Thu, 26 Mar 2020 14:12:17 +0500
Subject: [PATCH v19 1/5] Keep track of writing on non-temporary relation

Authors: Muhammad Usama, Masahiko Sawada, Ahutosh Bapat
---
 src/backend/executor/nodeModifyTable.c | 16 ++++++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 22 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d71c0a4322..870a7428f1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -574,6 +574,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -612,6 +616,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -963,6 +971,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1475,6 +1487,10 @@ lreplace:;
 	if (canSetTag)
 		(estate->es_processed)++;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/* AFTER ROW UPDATE Triggers */
 	ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, slot,
 						 recheckIndexes,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7ee04babc2..a04fc70326 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.21.1 (Apple Git-122.3)

v19-0003-Documentation-update.patchapplication/octet-stream; name=v19-0003-Documentation-update.patchDownload
From 49ece6bd9a68c49d7279f86493f6c3c2091fcfad Mon Sep 17 00:00:00 2001
From: Muhammad Usama <m.usama@highgo.ca>
Date: Thu, 26 Mar 2020 21:27:18 +0500
Subject: [PATCH v19 3/5] Documentation update.

Authors: Muhammad Usama, Masahiko Sawada, Ahutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 145 +++++++++++++
 doc/src/sgml/config.sgml                  | 146 ++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 158 +++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 ++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 841 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 64614b569c..af1c2dcbcf 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8192,6 +8192,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>open cursors</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
@@ -9650,6 +9655,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-dbout status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 355b408b0a..02849939ef 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4426,7 +4426,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -8928,6 +8927,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether transaction commit will wait for all involving foreign
+         transaction to be resolved before the command returns a "success"
+         indication to the client. Valid values are <literal>required</literal>,
+         <literal>prefer</literal> and <literal>disabled</literal>. The default
+         setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> the distributed transaction strictly
+         requires that all written servers can use two-phase commit protocol.
+         That is, the distributed transaction cannot commit if even one server
+         does not support the transaction management callback routines
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. Note that when <literal>disabled</literal> or
+         <literal>prefer</literal> there can be risk of database consistency
+         among all servers that involved in the distributed transaction when some
+         foreign server crashes during committing the distributed transaction.
+        </para>
+
+        <para>
+         Both <varname>max_prepared_foreign_transactions</varname> and
+         <varname>max_foreign_transaction_resolvers</varname> must be non-zero
+         value to set this parameter either <literal>required</literal> or
+         <literal>prefer</literal>.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..350b1afe68
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatially,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the partipanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for foreign transaction resolution. They commit or rollback all
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database of the coordinator side. On failure during resolution, they
+    retries to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function before dropping
+     the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions need to be resolved by using
+    <function>pg_resolve_foriegn_xact</function> function.
+    <productname>PostgreSQL</productname> doesn't have facilities to automatially
+    resolve in-doubt transactions. These behavior might change in a future release.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-monitoring">
+   <title>Monitoring</title>
+   <para>
+    The monitoring information about foreign transaction resolvers is visible in
+    <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link>
+    view. This view contains one row for every foreign transaction resolver worker.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..dd0358ef22 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..80a87fa5d1 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7a0bb0c70a..aed898248a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21849,6 +21849,95 @@ SELECT (pg_stat_file('filename')).modification;
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 270178d57e..cbc57d3c12 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -384,6 +384,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1264,6 +1272,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>CheckpointerMain</literal></entry>
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
         <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
@@ -1491,6 +1511,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>SafeSnapshot</literal></entry>
          <entry>Waiting for a snapshot for a <literal>READ ONLY DEFERRABLE</literal> transaction.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
         <row>
          <entry><literal>SyncRep</literal></entry>
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
@@ -2415,6 +2439,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e59cba7997..dee3f72f7e 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -163,6 +163,7 @@
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 1c19e863d2..3f4c806ed1 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.21.1 (Apple Git-122.3)

v19-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v19-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 49d46cd05af910b2613d85427e1a5b539034f7af Mon Sep 17 00:00:00 2001
From: Muhammad Usama <m.usama@highgo.ca>
Date: Thu, 26 Mar 2020 21:28:58 +0500
Subject: [PATCH v19 4/5] postgres_fdw supports atomic commit APIs.

Authors: Muhammad Usama, Masahiko Sawada, Ahutosh Bapat
---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 603 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 265 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 120 +++-
 doc/src/sgml/postgres-fdw.sgml                |  45 ++
 8 files changed, 822 insertions(+), 249 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index e45647f3ea..04410c27fd 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -92,23 +90,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +129,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +136,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +149,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +191,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +200,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +211,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +227,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -472,7 +520,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -699,193 +747,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -902,10 +763,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -916,6 +773,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1250,3 +1111,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recusively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..cd871fe314 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8961,16 +8982,226 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2175dff824..0873d1d4b7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..43ffd4f73f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..ce5785c27a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 94992be427..3f52daa11e 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -477,6 +477,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -504,6 +541,14 @@
    managed by creating corresponding remote savepoints.
   </para>
 
+  <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
   <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
-- 
2.21.1 (Apple Git-122.3)

#35Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#34)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada San,

I have been further reviewing and testing the transaction involving multiple server patches.
Overall the patches are working as expected bar a few important exceptions.
So as discussed over the call I have fixed the issues I found during the testing
and also rebased the patches with the current head of the master branch.
So can you please have a look at the attached updated patches.

Thank you for reviewing and updating the patch!

Below is the list of changes I have made on top of V18 patches.

1- In register_fdwxact(), As we are just storing the callback function pointers from
FdwRoutine in fdw_part structure, So I think we can avoid calling
GetFdwRoutineByServerId() in TopMemoryContext.
So I have moved the MemoryContextSwitch to TopMemoryContext after the
GetFdwRoutineByServerId() call.

Agreed.

2- If PrepareForeignTransaction functionality is not present in some FDW then
during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE
transaction flag if the modified flag is also set for that server. As for the server that has
not done any data modification within the transaction we do not do two-phase commit anyway.

Agreed.

3- I have moved the foreign_twophase_commit in sample file after
max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers
is 0 and enabling the foreign_twophase_commit produces an error with default
configuration parameter positioning in postgresql.conf
Also, foreign_twophase_commit configuration was missing the comments
about allowed values in the sample config file.

Sounds good. Agreed.

4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required()
function does not seem to be the correct place. The reason being, even when
is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired
to true, we could still end up not using the two-phase commit in the case when some server does
not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER
mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts()
function after doing the prepare transaction.

Agreed.

6- In prefer mode, we commit the transaction in single-phase if the server does not support
the two-phase commit. But instead of doing the single-phase commit right away,
IMHO the better way is to wait until all the two-phase transactions are successfully prepared
on servers that support the two-phase. Since an error during a "PREPARE" stage would
rollback the transaction and in that case, we would end up with committed transactions on
the server that lacks the support of the two-phase commit.

When an error occurred before the local commit, a 2pc-unsupported
server could be rolled back or committed depending on the error
timing. On the other hand all 2pc-supported servers are always rolled
back when an error occurred before the local commit. Therefore even if
we change the order of COMMIT and PREPARE it is still possible that we
will end up committing the part of 2pc-unsupported servers while
rolling back others including 2pc-supported servers.

I guess the motivation of your change is that since errors are likely
to happen during executing PREPARE on foreign servers, we can minimize
the possibility of rolling back 2pc-unsupported servers by deferring
the commit of 2pc-unsupported server as much as possible. Is that
right?

So I have modified the flow a little bit and instead of doing a one-phase commit right away
the servers that do not support a two-phase commit is added to another list and that list is
processed after once we have successfully prepared all the transactions on two-phase supported
foreign servers. Although this technique is also not bulletproof, still it is better than doing
the one-phase commits before doing the PREPAREs.

Hmm the current logic seems complex. Maybe we can just reverse the
order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
modified servers first and then do COMMIT on others?

Also, I think we can improve on this one by throwing an error even in PREFER
mode if there is more than one server that had data modified within the transaction
and lacks the two-phase commit support.

IIUC the concept of PREFER mode is that the transaction uses 2pc only
for 2pc-supported servers. IOW, even if the transaction modifies on a
2pc-unsupported server we can proceed with the commit if in PREFER
mode, which cannot if in REQUIRED mode. What is the motivation of your
above idea?

7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the
memory if fdw_part is removed from the list

I think at the end of the transaction we free entries of
FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
need to do that in PreCommit_FdwXacts()?

8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds
(FdwXactParticipants == NIL). The problem with that was in the case of
"COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
effectively the foreign prepared transactions(if any) associated with locally
prepared transactions were never getting resolved automatically.

postgres=# BEGIN;
BEGIN
INSERT INTO test_local VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'local_prepared';
PREPARE TRANSACTION

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)

-- Now commit the prepared transaction

postgres=# COMMIT PREPARED 'local_prepared';

COMMIT PREPARED

--Foreign prepared transactions associated with 'local_prepared' not resolved

postgres=#

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)

So to fix this in case of the two-phase transaction, the function checks the existence
of associated foreign prepared transactions before bailing out.

Good catch. But looking at your change, we should not accept the case
where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
false.

if (FdwXactParticipants == NIL)
{
/*
* If we are here because of COMMIT/ROLLBACK PREPARED then the
* FdwXactParticipants list would be empty. So we need to
* see if there are any foreign prepared transactions exists
* for this prepared transaction
*/
if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;

foreign_trans = get_fdwxacts(MyDatabaseId,
wait_xid, InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
}

9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord()
that was causing the crash during recovery.

Agreed.

10- incorporated set_ps_display() signature change.

Thanks.

Regarding other changes you did in v19 patch, I have some comments:

1.
+       ereport(LOG,
+                       (errmsg("trying to %s the foreign transaction
associated with transaction %u on server %u",
+                                       fdwxact->status ==
FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
+                                       fdwxact->local_xid,
fdwxact->serverid)));
+

Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function?

2.
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
deleted file mode 120000
index ce8c21880c..0000000000
--- a/src/bin/pg_waldump/fdwxactdesc.c
+++ /dev/null
@@ -1 +0,0 @@
-../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 100644
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c

We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.

3.
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1526,14 +1526,14 @@ postgres   27093  0.0  0.0  30096  2752 ?
  Ss   11:34   0:00 postgres: ser
          <entry><literal>SafeSnapshot</literal></entry>
          <entry>Waiting for a snapshot for a <literal>READ ONLY
DEFERRABLE</literal> transaction.</entry>
         </row>
-        <row>
-         <entry><literal>SyncRep</literal></entry>
-         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
-        </row>
         <row>
          <entry><literal>FdwXactResolution</literal></entry>
          <entry>Waiting for all foreign transaction participants to
be resolved during atomic commit among foreign servers.</entry>
         </row>
+        <row>
+         <entry><literal>SyncRep</literal></entry>
+         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
+        </row>
         <row>
          <entry morerows="4"><literal>Timeout</literal></entry>
          <entry><literal>BaseBackupThrottle</literal></entry>

We need to move the entry of FdwXactResolution to right before
Hash/Batch/Allocating for alphabetical order.

I've incorporated your changes I agreed with to my local branch and
will incorporate other changes after discussion. I'll also do more
test and self-review and will submit the latest version patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#36Muhammad Usama
m.usama@gmail.com
In reply to: Masahiko Sawada (#35)
1 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada San,

I have been further reviewing and testing the transaction involving

multiple server patches.

Overall the patches are working as expected bar a few important

exceptions.

So as discussed over the call I have fixed the issues I found during the

testing

and also rebased the patches with the current head of the master branch.
So can you please have a look at the attached updated patches.

Thank you for reviewing and updating the patch!

Below is the list of changes I have made on top of V18 patches.

1- In register_fdwxact(), As we are just storing the callback function

pointers from

FdwRoutine in fdw_part structure, So I think we can avoid calling
GetFdwRoutineByServerId() in TopMemoryContext.
So I have moved the MemoryContextSwitch to TopMemoryContext after the
GetFdwRoutineByServerId() call.

Agreed.

2- If PrepareForeignTransaction functionality is not present in some FDW

then

during the registration process we should only set the

XACT_FLAGS_FDWNOPREPARE

transaction flag if the modified flag is also set for that server. As

for the server that has

not done any data modification within the transaction we do not do

two-phase commit anyway.

Agreed.

3- I have moved the foreign_twophase_commit in sample file after
max_foreign_transaction_resolvers because the default value of

max_foreign_transaction_resolvers

is 0 and enabling the foreign_twophase_commit produces an error with

default

configuration parameter positioning in postgresql.conf
Also, foreign_twophase_commit configuration was missing the comments
about allowed values in the sample config file.

Sounds good. Agreed.

4- Setting ForeignTwophaseCommitIsRequired in

is_foreign_twophase_commit_required()

function does not seem to be the correct place. The reason being, even

when

is_foreign_twophase_commit_required() returns true after setting

ForeignTwophaseCommitIsRequired

to true, we could still end up not using the two-phase commit in the

case when some server does

not support two-phase commit and foreign_twophase_commit is set to

FOREIGN_TWOPHASE_COMMIT_PREFER

mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to

PreCommit_FdwXacts()

function after doing the prepare transaction.

Agreed.

6- In prefer mode, we commit the transaction in single-phase if the

server does not support

the two-phase commit. But instead of doing the single-phase commit right

away,

IMHO the better way is to wait until all the two-phase transactions are

successfully prepared

on servers that support the two-phase. Since an error during a "PREPARE"

stage would

rollback the transaction and in that case, we would end up with

committed transactions on

the server that lacks the support of the two-phase commit.

When an error occurred before the local commit, a 2pc-unsupported
server could be rolled back or committed depending on the error
timing. On the other hand all 2pc-supported servers are always rolled
back when an error occurred before the local commit. Therefore even if
we change the order of COMMIT and PREPARE it is still possible that we
will end up committing the part of 2pc-unsupported servers while
rolling back others including 2pc-supported servers.

I guess the motivation of your change is that since errors are likely
to happen during executing PREPARE on foreign servers, we can minimize
the possibility of rolling back 2pc-unsupported servers by deferring
the commit of 2pc-unsupported server as much as possible. Is that
right?

Yes, that is correct. The idea of doing the COMMIT on NON-2pc-supported
servers
after all the PREPAREs are successful is to minimize the chances of partial
commits.
And as you mentioned there will still be chances of getting a partial
commit even with
this approach but the probability of that would be less than what it is
with the
current sequence.

So I have modified the flow a little bit and instead of doing a

one-phase commit right away

the servers that do not support a two-phase commit is added to another

list and that list is

processed after once we have successfully prepared all the transactions

on two-phase supported

foreign servers. Although this technique is also not bulletproof, still

it is better than doing

the one-phase commits before doing the PREPAREs.

Hmm the current logic seems complex. Maybe we can just reverse the
order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
modified servers first and then do COMMIT on others?

Agreed, seems reasonable.

Also, I think we can improve on this one by throwing an error even in

PREFER

mode if there is more than one server that had data modified within the

transaction

and lacks the two-phase commit support.

IIUC the concept of PREFER mode is that the transaction uses 2pc only
for 2pc-supported servers. IOW, even if the transaction modifies on a
2pc-unsupported server we can proceed with the commit if in PREFER
mode, which cannot if in REQUIRED mode. What is the motivation of your
above idea?

I was thinking that we could change the behavior of PREFER mode such that
we only allow
to COMMIT the transaction if the transaction needs to do a single-phase
commit on one
server only. That way we can ensure that we would never end up with partial
commit.

One Idea in this regards would be to switch the local transaction to commit
using 2pc
if there is a total of only one foreign server that does not support the
2pc in the transaction,
ensuring that 1-pc commit servers should always be less than or equal to 1.
and if there are more
than one foreign server requires 1-pc then we just throw an error.

However having said that, I am not 100% sure if its a good or an acceptable
Idea, and
I am okay with continuing with the current behavior of PREFER mode if we
put it in the
document that this mode can cause a partial commit.

7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to

reclaim the

memory if fdw_part is removed from the list

I think at the end of the transaction we free entries of
FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
need to do that in PreCommit_FdwXacts()?

Correct me if I am wrong, The fdw_part structures are created in
TopMemoryContext
and if that fdw_part structure is removed from the list at pre_commit stage
(because we did 1-PC COMMIT on it) then it would leak memory.

8- The function FdwXactWaitToBeResolved() was bailing out as soon as it

finds

(FdwXactParticipants == NIL). The problem with that was in the case of
"COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
effectively the foreign prepared transactions(if any) associated with

locally

prepared transactions were never getting resolved automatically.

postgres=# BEGIN;
BEGIN
INSERT INTO test_local VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'local_prepared';
PREPARE TRANSACTION

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt |

identifier

-------+-----+----------+--------+----------+----------+----------------------------

12929 | 515 | 16389 | 10 | prepared | f |

fx_1339567411_515_16389_10

12929 | 515 | 16391 | 10 | prepared | f |

fx_1963224020_515_16391_10

(2 rows)

-- Now commit the prepared transaction

postgres=# COMMIT PREPARED 'local_prepared';

COMMIT PREPARED

--Foreign prepared transactions associated with 'local_prepared' not

resolved

postgres=#

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt |

identifier

-------+-----+----------+--------+----------+----------+----------------------------

12929 | 515 | 16389 | 10 | prepared | f |

fx_1339567411_515_16389_10

12929 | 515 | 16391 | 10 | prepared | f |

fx_1963224020_515_16391_10

(2 rows)

So to fix this in case of the two-phase transaction, the function checks

the existence

of associated foreign prepared transactions before bailing out.

Good catch. But looking at your change, we should not accept the case
where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
false.

if (FdwXactParticipants == NIL)
{
/*
* If we are here because of COMMIT/ROLLBACK PREPARED then
the
* FdwXactParticipants list would be empty. So we need to
* see if there are any foreign prepared transactions exists
* for this prepared transaction
*/
if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;

foreign_trans = get_fdwxacts(MyDatabaseId,
wait_xid, InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
}

Sorry my bad, its a mistake on my part. we should just return from the
function when
FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false.

if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;
foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid,
InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
else
return;

9- In function XlogReadFdwXactData() XLogBeginRead call was missing

before XLogReadRecord()

that was causing the crash during recovery.

Agreed.

10- incorporated set_ps_display() signature change.

Thanks.

Regarding other changes you did in v19 patch, I have some comments:

1.
+       ereport(LOG,
+                       (errmsg("trying to %s the foreign transaction
associated with transaction %u on server %u",
+                                       fdwxact->status ==
FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
+                                       fdwxact->local_xid,
fdwxact->serverid)));
+

Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL
function?

That change was not intended to get into the patch file. I had done it
during testing to
quickly get info on which way the transaction is going to be resolved.

2.
diff --git a/src/bin/pg_waldump/fdwxactdesc.c
b/src/bin/pg_waldump/fdwxactdesc.c
deleted file mode 120000
index ce8c21880c..0000000000
--- a/src/bin/pg_waldump/fdwxactdesc.c
+++ /dev/null
@@ -1 +0,0 @@
-../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/fdwxactdesc.c
b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 100644
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c

We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.

Again sorry! that was an oversight on my part.

3.
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1526,14 +1526,14 @@ postgres   27093  0.0  0.0  30096  2752 ?
Ss   11:34   0:00 postgres: ser
<entry><literal>SafeSnapshot</literal></entry>
<entry>Waiting for a snapshot for a <literal>READ ONLY
DEFERRABLE</literal> transaction.</entry>
</row>
-        <row>
-         <entry><literal>SyncRep</literal></entry>
-         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
-        </row>
<row>
<entry><literal>FdwXactResolution</literal></entry>
<entry>Waiting for all foreign transaction participants to
be resolved during atomic commit among foreign servers.</entry>
</row>
+        <row>
+         <entry><literal>SyncRep</literal></entry>
+         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
+        </row>
<row>
<entry morerows="4"><literal>Timeout</literal></entry>
<entry><literal>BaseBackupThrottle</literal></entry>

We need to move the entry of FdwXactResolution to right before
Hash/Batch/Allocating for alphabetical order.

Agreed!

I've incorporated your changes I agreed with to my local branch and
will incorporate other changes after discussion. I'll also do more
test and self-review and will submit the latest version patch.

Meanwhile, I found a couple of more small issues, One is the break
statement missing
i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers()
could return un-initialized value.
I am attaching a small patch for these changes that can be applied on top
of existing
patches.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Best Regards,
Muhammad Usama
Highgo Software
URL : http://www.highgo.ca

Attachments:

fdwxact_fixes.diffapplication/octet-stream; name=fdwxact_fixes.diffDownload
commit e50a1deee6eefdfe4ac618db336ac850257f3c3f
Author: Muhammad Usama <m.usama@highgo.ca>
Date:   Fri Mon 27 13:14:10 2020 +0500

    minor fixes in transactions involving multiple servers patch

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 0990a4e3ed..50e745b603 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -1543,7 +1543,7 @@ FdwXactResolveTransaction(Oid dbid, TransactionId xid, PGPROC *waiter)
 
 	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
 
-	while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL)
+	while ((fdwxact = get_fdwxact_to_resolve(dbid, xid)) != NULL)
 	{
 		FdwXactRslvState *state;
 		ForeignServer *server;
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
index e293d13562..a871727661 100644
--- a/src/backend/access/fdwxact/launcher.c
+++ b/src/backend/access/fdwxact/launcher.c
@@ -394,7 +394,7 @@ fdwxact_relaunch_resolvers(void)
 	HASHCTL		ctl;
 	HASH_SEQ_STATUS status;
 	Oid		   *entry;
-	bool		launched;
+	bool		launched = false;
 	int			i;
 
 	memset(&ctl, 0, sizeof(ctl));
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 9dce03a6e4..26d6a08b14 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3856,6 +3856,7 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 			break;
 		case WAIT_EVENT_FDWXACT:
 			event_name = "FdwXact";
+			break;
 		case WAIT_EVENT_FDWXACT_RESOLUTION:
 			event_name = "FdwXactResolution";
 			break;
#37Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#36)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 28 Apr 2020 at 19:37, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada San,

I have been further reviewing and testing the transaction involving multiple server patches.
Overall the patches are working as expected bar a few important exceptions.
So as discussed over the call I have fixed the issues I found during the testing
and also rebased the patches with the current head of the master branch.
So can you please have a look at the attached updated patches.

Thank you for reviewing and updating the patch!

Below is the list of changes I have made on top of V18 patches.

1- In register_fdwxact(), As we are just storing the callback function pointers from
FdwRoutine in fdw_part structure, So I think we can avoid calling
GetFdwRoutineByServerId() in TopMemoryContext.
So I have moved the MemoryContextSwitch to TopMemoryContext after the
GetFdwRoutineByServerId() call.

Agreed.

2- If PrepareForeignTransaction functionality is not present in some FDW then
during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE
transaction flag if the modified flag is also set for that server. As for the server that has
not done any data modification within the transaction we do not do two-phase commit anyway.

Agreed.

3- I have moved the foreign_twophase_commit in sample file after
max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers
is 0 and enabling the foreign_twophase_commit produces an error with default
configuration parameter positioning in postgresql.conf
Also, foreign_twophase_commit configuration was missing the comments
about allowed values in the sample config file.

Sounds good. Agreed.

4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required()
function does not seem to be the correct place. The reason being, even when
is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired
to true, we could still end up not using the two-phase commit in the case when some server does
not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER
mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts()
function after doing the prepare transaction.

Agreed.

6- In prefer mode, we commit the transaction in single-phase if the server does not support
the two-phase commit. But instead of doing the single-phase commit right away,
IMHO the better way is to wait until all the two-phase transactions are successfully prepared
on servers that support the two-phase. Since an error during a "PREPARE" stage would
rollback the transaction and in that case, we would end up with committed transactions on
the server that lacks the support of the two-phase commit.

When an error occurred before the local commit, a 2pc-unsupported
server could be rolled back or committed depending on the error
timing. On the other hand all 2pc-supported servers are always rolled
back when an error occurred before the local commit. Therefore even if
we change the order of COMMIT and PREPARE it is still possible that we
will end up committing the part of 2pc-unsupported servers while
rolling back others including 2pc-supported servers.

I guess the motivation of your change is that since errors are likely
to happen during executing PREPARE on foreign servers, we can minimize
the possibility of rolling back 2pc-unsupported servers by deferring
the commit of 2pc-unsupported server as much as possible. Is that
right?

Yes, that is correct. The idea of doing the COMMIT on NON-2pc-supported servers
after all the PREPAREs are successful is to minimize the chances of partial commits.
And as you mentioned there will still be chances of getting a partial commit even with
this approach but the probability of that would be less than what it is with the
current sequence.

So I have modified the flow a little bit and instead of doing a one-phase commit right away
the servers that do not support a two-phase commit is added to another list and that list is
processed after once we have successfully prepared all the transactions on two-phase supported
foreign servers. Although this technique is also not bulletproof, still it is better than doing
the one-phase commits before doing the PREPAREs.

Hmm the current logic seems complex. Maybe we can just reverse the
order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
modified servers first and then do COMMIT on others?

Agreed, seems reasonable.

Also, I think we can improve on this one by throwing an error even in PREFER
mode if there is more than one server that had data modified within the transaction
and lacks the two-phase commit support.

IIUC the concept of PREFER mode is that the transaction uses 2pc only
for 2pc-supported servers. IOW, even if the transaction modifies on a
2pc-unsupported server we can proceed with the commit if in PREFER
mode, which cannot if in REQUIRED mode. What is the motivation of your
above idea?

I was thinking that we could change the behavior of PREFER mode such that we only allow
to COMMIT the transaction if the transaction needs to do a single-phase commit on one
server only. That way we can ensure that we would never end up with partial commit.

I think it's good to avoid a partial commit by using your idea but if
we want to avoid a partial commit we can use the 'required' mode,
which requires all participant servers to support 2pc. We throw an
error if participant servers include even one 2pc-unsupported server
is modified within the transaction. Of course if the participant node
is only one 2pc-unsupported server it can use 1pc even in the
'required' mode.

One Idea in this regards would be to switch the local transaction to commit using 2pc
if there is a total of only one foreign server that does not support the 2pc in the transaction,
ensuring that 1-pc commit servers should always be less than or equal to 1. and if there are more
than one foreign server requires 1-pc then we just throw an error.

I might be missing your point but I suppose this idea is to do
something like the following?

1. prepare the local transaction
2. commit the foreign transaction on 2pc-unsupported server
3. commit the prepared local transaction

However having said that, I am not 100% sure if its a good or an acceptable Idea, and
I am okay with continuing with the current behavior of PREFER mode if we put it in the
document that this mode can cause a partial commit.

There will three types of servers: (a) a server doesn't support any
transaction API, (b) a server supports only commit and rollback API
and (c) a server supports all APIs (commit, rollback and prepare).
Currently postgres transaction manager manages only server-(b) and
server-(c), adds them to FdwXactParticipants. I'm considering changing
the code so that it adds also server-(a) to FdwXactParticipants, in
order to track the number of server-(a) involved in the transaction.
But it doesn't insert FdwXact entry for it, and manage transactions on
these servers.

The reason is this; if we want to have the 'required' mode strictly
require all participant servers to support 2pc, we should use 2pc when
(# of server-(a) + # of server-(b) + # of server-(c)) >= 2. But since
currently we just track the modification on a server-(a) by a flag we
cannot handle the case where two server-(a) are modified in the
transaction. On the other hand, if we don't consider server-(a) the
transaction could end up with a partial commit when a server-(a)
participates in the transaction. Therefore I'm thinking of the above
change so that the transaction manager can ensure that a partial
commit doesn't happen in the 'required' mode. What do you think?

7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the
memory if fdw_part is removed from the list

I think at the end of the transaction we free entries of
FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
need to do that in PreCommit_FdwXacts()?

Correct me if I am wrong, The fdw_part structures are created in TopMemoryContext
and if that fdw_part structure is removed from the list at pre_commit stage
(because we did 1-PC COMMIT on it) then it would leak memory.

The fdw_part structures are created in TopTransactionContext so these
are freed at the end of the transaction.

8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds
(FdwXactParticipants == NIL). The problem with that was in the case of
"COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
effectively the foreign prepared transactions(if any) associated with locally
prepared transactions were never getting resolved automatically.

postgres=# BEGIN;
BEGIN
INSERT INTO test_local VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'local_prepared';
PREPARE TRANSACTION

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)

-- Now commit the prepared transaction

postgres=# COMMIT PREPARED 'local_prepared';

COMMIT PREPARED

--Foreign prepared transactions associated with 'local_prepared' not resolved

postgres=#

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)

So to fix this in case of the two-phase transaction, the function checks the existence
of associated foreign prepared transactions before bailing out.

Good catch. But looking at your change, we should not accept the case
where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
false.

if (FdwXactParticipants == NIL)
{
/*
* If we are here because of COMMIT/ROLLBACK PREPARED then the
* FdwXactParticipants list would be empty. So we need to
* see if there are any foreign prepared transactions exists
* for this prepared transaction
*/
if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;

foreign_trans = get_fdwxacts(MyDatabaseId,
wait_xid, InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
}

Sorry my bad, its a mistake on my part. we should just return from the function when
FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false.

if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;
foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
else
return;

9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord()
that was causing the crash during recovery.

Agreed.

10- incorporated set_ps_display() signature change.

Thanks.

Regarding other changes you did in v19 patch, I have some comments:

1.
+       ereport(LOG,
+                       (errmsg("trying to %s the foreign transaction
associated with transaction %u on server %u",
+                                       fdwxact->status ==
FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
+                                       fdwxact->local_xid,
fdwxact->serverid)));
+

Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function?

That change was not intended to get into the patch file. I had done it during testing to
quickly get info on which way the transaction is going to be resolved.

2.
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
deleted file mode 120000
index ce8c21880c..0000000000
--- a/src/bin/pg_waldump/fdwxactdesc.c
+++ /dev/null
@@ -1 +0,0 @@
-../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 100644
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c

We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.

Again sorry! that was an oversight on my part.

3.
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1526,14 +1526,14 @@ postgres   27093  0.0  0.0  30096  2752 ?
Ss   11:34   0:00 postgres: ser
<entry><literal>SafeSnapshot</literal></entry>
<entry>Waiting for a snapshot for a <literal>READ ONLY
DEFERRABLE</literal> transaction.</entry>
</row>
-        <row>
-         <entry><literal>SyncRep</literal></entry>
-         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
-        </row>
<row>
<entry><literal>FdwXactResolution</literal></entry>
<entry>Waiting for all foreign transaction participants to
be resolved during atomic commit among foreign servers.</entry>
</row>
+        <row>
+         <entry><literal>SyncRep</literal></entry>
+         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
+        </row>
<row>
<entry morerows="4"><literal>Timeout</literal></entry>
<entry><literal>BaseBackupThrottle</literal></entry>

We need to move the entry of FdwXactResolution to right before
Hash/Batch/Allocating for alphabetical order.

Agreed!

I've incorporated your changes I agreed with to my local branch and
will incorporate other changes after discussion. I'll also do more
test and self-review and will submit the latest version patch.

Meanwhile, I found a couple of more small issues, One is the break statement missing
i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers()
could return un-initialized value.
I am attaching a small patch for these changes that can be applied on top of existing
patches.

Thank you for the patch!

I'm updating the patches because current behavior in error case would
not be good. For example, when an error occurs in the prepare phase,
prepared transactions are left as in-doubt transaction. And these
transactions are not handled by the resolver process. That means that
a user could need to resolve these transactions manually every abort
time, which is not good. In abort case, I think that prepared
transactions can be resolved by the backend itself, rather than
leaving them for the resolver. I'll submit the updated patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#38Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiko Sawada (#37)
5 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 30 Apr 2020 at 20:43, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 28 Apr 2020 at 19:37, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada San,

I have been further reviewing and testing the transaction involving multiple server patches.
Overall the patches are working as expected bar a few important exceptions.
So as discussed over the call I have fixed the issues I found during the testing
and also rebased the patches with the current head of the master branch.
So can you please have a look at the attached updated patches.

Thank you for reviewing and updating the patch!

Below is the list of changes I have made on top of V18 patches.

1- In register_fdwxact(), As we are just storing the callback function pointers from
FdwRoutine in fdw_part structure, So I think we can avoid calling
GetFdwRoutineByServerId() in TopMemoryContext.
So I have moved the MemoryContextSwitch to TopMemoryContext after the
GetFdwRoutineByServerId() call.

Agreed.

2- If PrepareForeignTransaction functionality is not present in some FDW then
during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE
transaction flag if the modified flag is also set for that server. As for the server that has
not done any data modification within the transaction we do not do two-phase commit anyway.

Agreed.

3- I have moved the foreign_twophase_commit in sample file after
max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers
is 0 and enabling the foreign_twophase_commit produces an error with default
configuration parameter positioning in postgresql.conf
Also, foreign_twophase_commit configuration was missing the comments
about allowed values in the sample config file.

Sounds good. Agreed.

4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required()
function does not seem to be the correct place. The reason being, even when
is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired
to true, we could still end up not using the two-phase commit in the case when some server does
not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER
mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts()
function after doing the prepare transaction.

Agreed.

6- In prefer mode, we commit the transaction in single-phase if the server does not support
the two-phase commit. But instead of doing the single-phase commit right away,
IMHO the better way is to wait until all the two-phase transactions are successfully prepared
on servers that support the two-phase. Since an error during a "PREPARE" stage would
rollback the transaction and in that case, we would end up with committed transactions on
the server that lacks the support of the two-phase commit.

When an error occurred before the local commit, a 2pc-unsupported
server could be rolled back or committed depending on the error
timing. On the other hand all 2pc-supported servers are always rolled
back when an error occurred before the local commit. Therefore even if
we change the order of COMMIT and PREPARE it is still possible that we
will end up committing the part of 2pc-unsupported servers while
rolling back others including 2pc-supported servers.

I guess the motivation of your change is that since errors are likely
to happen during executing PREPARE on foreign servers, we can minimize
the possibility of rolling back 2pc-unsupported servers by deferring
the commit of 2pc-unsupported server as much as possible. Is that
right?

Yes, that is correct. The idea of doing the COMMIT on NON-2pc-supported servers
after all the PREPAREs are successful is to minimize the chances of partial commits.
And as you mentioned there will still be chances of getting a partial commit even with
this approach but the probability of that would be less than what it is with the
current sequence.

So I have modified the flow a little bit and instead of doing a one-phase commit right away
the servers that do not support a two-phase commit is added to another list and that list is
processed after once we have successfully prepared all the transactions on two-phase supported
foreign servers. Although this technique is also not bulletproof, still it is better than doing
the one-phase commits before doing the PREPAREs.

Hmm the current logic seems complex. Maybe we can just reverse the
order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
modified servers first and then do COMMIT on others?

Agreed, seems reasonable.

Also, I think we can improve on this one by throwing an error even in PREFER
mode if there is more than one server that had data modified within the transaction
and lacks the two-phase commit support.

IIUC the concept of PREFER mode is that the transaction uses 2pc only
for 2pc-supported servers. IOW, even if the transaction modifies on a
2pc-unsupported server we can proceed with the commit if in PREFER
mode, which cannot if in REQUIRED mode. What is the motivation of your
above idea?

I was thinking that we could change the behavior of PREFER mode such that we only allow
to COMMIT the transaction if the transaction needs to do a single-phase commit on one
server only. That way we can ensure that we would never end up with partial commit.

I think it's good to avoid a partial commit by using your idea but if
we want to avoid a partial commit we can use the 'required' mode,
which requires all participant servers to support 2pc. We throw an
error if participant servers include even one 2pc-unsupported server
is modified within the transaction. Of course if the participant node
is only one 2pc-unsupported server it can use 1pc even in the
'required' mode.

One Idea in this regards would be to switch the local transaction to commit using 2pc
if there is a total of only one foreign server that does not support the 2pc in the transaction,
ensuring that 1-pc commit servers should always be less than or equal to 1. and if there are more
than one foreign server requires 1-pc then we just throw an error.

I might be missing your point but I suppose this idea is to do
something like the following?

1. prepare the local transaction
2. commit the foreign transaction on 2pc-unsupported server
3. commit the prepared local transaction

However having said that, I am not 100% sure if its a good or an acceptable Idea, and
I am okay with continuing with the current behavior of PREFER mode if we put it in the
document that this mode can cause a partial commit.

There will three types of servers: (a) a server doesn't support any
transaction API, (b) a server supports only commit and rollback API
and (c) a server supports all APIs (commit, rollback and prepare).
Currently postgres transaction manager manages only server-(b) and
server-(c), adds them to FdwXactParticipants. I'm considering changing
the code so that it adds also server-(a) to FdwXactParticipants, in
order to track the number of server-(a) involved in the transaction.
But it doesn't insert FdwXact entry for it, and manage transactions on
these servers.

The reason is this; if we want to have the 'required' mode strictly
require all participant servers to support 2pc, we should use 2pc when
(# of server-(a) + # of server-(b) + # of server-(c)) >= 2. But since
currently we just track the modification on a server-(a) by a flag we
cannot handle the case where two server-(a) are modified in the
transaction. On the other hand, if we don't consider server-(a) the
transaction could end up with a partial commit when a server-(a)
participates in the transaction. Therefore I'm thinking of the above
change so that the transaction manager can ensure that a partial
commit doesn't happen in the 'required' mode. What do you think?

7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the
memory if fdw_part is removed from the list

I think at the end of the transaction we free entries of
FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
need to do that in PreCommit_FdwXacts()?

Correct me if I am wrong, The fdw_part structures are created in TopMemoryContext
and if that fdw_part structure is removed from the list at pre_commit stage
(because we did 1-PC COMMIT on it) then it would leak memory.

The fdw_part structures are created in TopTransactionContext so these
are freed at the end of the transaction.

8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds
(FdwXactParticipants == NIL). The problem with that was in the case of
"COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
effectively the foreign prepared transactions(if any) associated with locally
prepared transactions were never getting resolved automatically.

postgres=# BEGIN;
BEGIN
INSERT INTO test_local VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'local_prepared';
PREPARE TRANSACTION

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)

-- Now commit the prepared transaction

postgres=# COMMIT PREPARED 'local_prepared';

COMMIT PREPARED

--Foreign prepared transactions associated with 'local_prepared' not resolved

postgres=#

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt | identifier
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)

So to fix this in case of the two-phase transaction, the function checks the existence
of associated foreign prepared transactions before bailing out.

Good catch. But looking at your change, we should not accept the case
where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
false.

if (FdwXactParticipants == NIL)
{
/*
* If we are here because of COMMIT/ROLLBACK PREPARED then the
* FdwXactParticipants list would be empty. So we need to
* see if there are any foreign prepared transactions exists
* for this prepared transaction
*/
if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;

foreign_trans = get_fdwxacts(MyDatabaseId,
wait_xid, InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
}

Sorry my bad, its a mistake on my part. we should just return from the function when
FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false.

if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;
foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
else
return;

9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord()
that was causing the crash during recovery.

Agreed.

10- incorporated set_ps_display() signature change.

Thanks.

Regarding other changes you did in v19 patch, I have some comments:

1.
+       ereport(LOG,
+                       (errmsg("trying to %s the foreign transaction
associated with transaction %u on server %u",
+                                       fdwxact->status ==
FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
+                                       fdwxact->local_xid,
fdwxact->serverid)));
+

Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function?

That change was not intended to get into the patch file. I had done it during testing to
quickly get info on which way the transaction is going to be resolved.

2.
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
deleted file mode 120000
index ce8c21880c..0000000000
--- a/src/bin/pg_waldump/fdwxactdesc.c
+++ /dev/null
@@ -1 +0,0 @@
-../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 100644
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c

We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.

Again sorry! that was an oversight on my part.

3.
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1526,14 +1526,14 @@ postgres   27093  0.0  0.0  30096  2752 ?
Ss   11:34   0:00 postgres: ser
<entry><literal>SafeSnapshot</literal></entry>
<entry>Waiting for a snapshot for a <literal>READ ONLY
DEFERRABLE</literal> transaction.</entry>
</row>
-        <row>
-         <entry><literal>SyncRep</literal></entry>
-         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
-        </row>
<row>
<entry><literal>FdwXactResolution</literal></entry>
<entry>Waiting for all foreign transaction participants to
be resolved during atomic commit among foreign servers.</entry>
</row>
+        <row>
+         <entry><literal>SyncRep</literal></entry>
+         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
+        </row>
<row>
<entry morerows="4"><literal>Timeout</literal></entry>
<entry><literal>BaseBackupThrottle</literal></entry>

We need to move the entry of FdwXactResolution to right before
Hash/Batch/Allocating for alphabetical order.

Agreed!

I've incorporated your changes I agreed with to my local branch and
will incorporate other changes after discussion. I'll also do more
test and self-review and will submit the latest version patch.

Meanwhile, I found a couple of more small issues, One is the break statement missing
i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers()
could return un-initialized value.
I am attaching a small patch for these changes that can be applied on top of existing
patches.

Thank you for the patch!

I'm updating the patches because current behavior in error case would
not be good. For example, when an error occurs in the prepare phase,
prepared transactions are left as in-doubt transaction. And these
transactions are not handled by the resolver process. That means that
a user could need to resolve these transactions manually every abort
time, which is not good. In abort case, I think that prepared
transactions can be resolved by the backend itself, rather than
leaving them for the resolver. I'll submit the updated patch.

I've attached the latest version patch set which includes some changes
from the previous version:

* I've added regression tests that test all types of FDW
implementations. There are three types of FDW: FDW doesn't support any
transaction APIs, FDW supports only commit and rollback APIs and FDW
supports all (prepare, commit and rollback) APISs.
src/test/module/test_fdwxact contains those FDW implementations for
tests, and test some cases where a transaction reads/writes data on
various types of foreign servers.
* Also test_fdwxact has TAP tests that check failure cases. The test
FDW implementation has the ability to inject error or panic into
prepare or commit phase. Using it the TAP test checks if distributed
transactions can be committed or rolled back even in failure cases.
* When foreign_twophase_commit = 'required', the transaction commit
fails if the transaction modified data on even one server not
supporting prepare API. Previously, we used to ignore servers that
don't support any transaction API but we check them to strictly
require all involved foreign servers to support all transaction APIs.
* Transaction resolver process resolves in-doubt transactions automatically.
* Incorporated comments from Muhammad Usama.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v20-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v20-0005-Add-regression-tests-for-atomic-commit.patchDownload
From c64479392f1d47d128d6c034697d251f6f37b527 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v20 5/5] Add regression tests for atomic commit.

---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 ++
 .../test_fdwxact/expected/test_fdwxact.out    | 188 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 178 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 119 +++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 471 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 +++++++
 src/test/regress/pg_regress.c                 |  13 +-
 13 files changed, 1229 insertions(+), 5 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 29de73c060..8a48e6ba19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -13,6 +13,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..a5c8b89655
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,188 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- Test 'disabled' case.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' case.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Test 'prefer' case.
+-- The cases where failed in 'required' case shoul pass in 'prefer'.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+-- We modify at least one server that doesn't support two-phase commit.
+-- These servers are committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..554312542f
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,178 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- Test 'disabled' case.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+-- Test 'required' case.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+-- Test 'prefer' case.
+-- The cases where failed in 'required' case shoul pass in 'prefer'.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+
+-- We modify at least one server that doesn't support two-phase commit.
+-- These servers are committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..c712fd1cf3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,119 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql) = @_;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 COMMIT;");
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
+
+# Inject an panic into prepare phase on srv_2pc_2. The server crashes after preparing both
+# foreign transaction. After the restart, those transactions are recovered as in-doubt
+# transactions. We check if the resolver process rollbacks those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'prepare', 'srv_2pc_2');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
+
+# Inject an panic into commit phase on srv_2pc_1. The server crashes due to the panic
+# error raised by resolver process during commit prepared foreign transaction on srv_2pc_1.
+# After the restart, those transactions are recovered as in-doubt transactions. We check if
+# the resolver process commits those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'commit', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..d8bfe48d96
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,471 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (strcasecmp(fxss->server, servername) != 0 ||
+		strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 38b2b1e8e1..f30fe6b492 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2335,9 +2335,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2352,7 +2355,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v20-0003-Documentation-update.patchapplication/octet-stream; name=v20-0003-Documentation-update.patchDownload
From 028f5416a4382821bb36d00cc374d54bbd9ab02b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:27:18 +0500
Subject: [PATCH v20 3/5] Documentation update.

---
 doc/src/sgml/catalogs.sgml                | 145 +++++++++++++
 doc/src/sgml/config.sgml                  | 147 +++++++++++++-
 doc/src/sgml/distributed-transaction.sgml | 154 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 ++++++++
 doc/src/sgml/monitoring.sgml              |  60 ++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 838 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index fcadba331d..8daeb7c006 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8237,6 +8237,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>open cursors</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
@@ -9695,6 +9700,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-doubt status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9f2a4a2470..e835c19dac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4487,7 +4487,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
 
      </variablelist>
     </sect2>
-
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -9081,6 +9080,152 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal>, <literal>prefer</literal> and
+         <literal>disabled</literal>. The default setting is
+         <literal>disabled</literal>. Setting to <literal>disabled</literal>
+         don't use two-phase commit protocol to commit or rollback distributed
+         transactions. When set to <literal>required</literal> distributed
+         transactions strictly requires that all written servers can use
+         two-phase commit protocol.  That is, the distributed transaction cannot
+         commit if even one server does not support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. In <literal>prefer</literal> and <literal>required</literal> case,
+         distributed transaction commit will wait for all involving foreign
+         transaction to be committed before the command return a "success"
+         indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> or <literal>prefer</literal> there
+          can be risk of database consistency among all servers that involved in
+          the distributed transaction when some foreign server crashes during
+          committing the distributed transaction.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..85d8e8e9e4
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,154 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were not simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the participanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions are resolved by foreign transaction resolver
+    process or executing <function>pg_resolve_foriegn_xact</function> function
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for resolving both foreign transactions that are prepared by
+    online transactions and in-doubt transactions. They commit or rollback
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database connecting to. On failure during resolution, they retry to
+    resolve at an interval of <varname>foreign_transaction_resolution_interval</varname>
+    time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..dd0358ef22 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 68179f71cd..1ab8e80fdc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c06afd3ea..e281bd33d8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26126,6 +26126,95 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 579ccd34d4..9073c01eb9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -384,6 +384,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1271,6 +1279,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>CheckpointerMain</literal></entry>
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactLauncherMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>FdwXactResolverMain</literal></entry>
+         <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalLauncherMain</literal></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
         <row>
          <entry><literal>LogicalApplyMain</literal></entry>
          <entry>Waiting in main loop of logical apply process.</entry>
@@ -1506,6 +1526,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>SafeSnapshot</literal></entry>
          <entry>Waiting for a snapshot for a <literal>READ ONLY DEFERRABLE</literal> transaction.</entry>
         </row>
+        <row>
+         <entry><literal>FdwXactResolution</literal></entry>
+         <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry>
+        </row>
         <row>
          <entry><literal>SyncRep</literal></entry>
          <entry>Waiting for confirmation from remote server during synchronous replication.</entry>
@@ -2430,6 +2454,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index c41ce9499b..5ef1f4a329 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -170,6 +170,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v20-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v20-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 76b4e2bc7977c1338d7fe72f4b27cd03ea42a0c4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 14:12:17 +0500
Subject: [PATCH v20 1/5] Keep track of writing on non-temporary relation

---
 src/backend/executor/nodeModifyTable.c | 16 ++++++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 22 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..1ec07bad07 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -581,6 +581,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -619,6 +623,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -970,6 +978,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1482,6 +1494,10 @@ lreplace:;
 	if (canSetTag)
 		(estate->es_processed)++;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/* AFTER ROW UPDATE Triggers */
 	ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, slot,
 						 recheckIndexes,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7ee04babc2..a04fc70326 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.23.0

v20-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v20-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 910d440283c4582ff2706c37661a08ab036057b9 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:28:58 +0500
Subject: [PATCH v20 4/5] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 603 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 280 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 124 +++-
 doc/src/sgml/postgres-fdw.sgml                |  45 ++
 8 files changed, 831 insertions(+), 259 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index e45647f3ea..bb859e2927 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -92,23 +90,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +129,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +136,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +149,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +191,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +200,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +211,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +227,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -472,7 +520,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -699,193 +747,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -902,10 +763,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -916,6 +773,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1250,3 +1111,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recursively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..8c31e26406 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8923,10 +8944,10 @@ RESET ROLE;
 ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD password_required 'false');
 SET ROLE regress_nosuper;
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
+ c1 | c2 |        c3         |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------------------+------------------------------+--------------------------+----+------------+-----
+  1 |  2 | 00001_trig_update | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1          | foo
 (1 row)
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
@@ -8943,16 +8964,16 @@ HINT:  User mappings with the sslcert or sslkey options set may only be created
 DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapping.
 RESET ROLE;
 -- The user mapping for public is passwordless and lacks the password_required=false
 -- mapping option, but will work because the current user is a superuser.
 SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+ c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo
 (1 row)
 
 -- cleanup
@@ -8961,16 +8982,225 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" of relation "T 5" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..105451d199 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..43ffd4f73f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..1ef66123df 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2598,7 +2621,7 @@ ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD passwor
 SET ROLE regress_nosuper;
 
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
 -- first set password_required so we see the right error messages
@@ -2612,7 +2635,7 @@ DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 RESET ROLE;
 
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 94992be427..3f52daa11e 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -477,6 +477,43 @@
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -504,6 +541,14 @@
    managed by creating corresponding remote savepoints.
   </para>
 
+  <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
   <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
-- 
2.23.0

v20-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v20-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 811ef5995e30b4047f951886e3a8aa64ecf9469d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:22:17 +0500
Subject: [PATCH v20 2/5] Support atomic commit among multiple foreign servers.

---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  110 +
 src/backend/access/fdwxact/fdwxact.c          | 2736 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  560 ++++
 src/backend/access/fdwxact/resolver.c         |  436 +++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   56 +
 src/backend/access/transam/xact.c             |   26 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |    6 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   20 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   82 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  162 +
 src/include/access/fdwxact_launcher.h         |   28 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   63 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   22 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    9 +-
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    3 +
 src/test/regress/expected/rules.out           |    7 +
 55 files changed, 4796 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..361a46f8e3
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,110 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant. The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+We record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE each foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for PREPARE, COMMIT or ROLLBACK
+the transaction on the foreign server, respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions will have an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before the foreign
+transaction is committed and aborted by FDW callback functions respectively.
+FdwXact entry is removed once the foreign transaction is resolved with WAL
+logging.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*1). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                     PREPARING                      |----+
+ +----------------------------------------------------+    |
+                          |                                |
+	                  v                                |
+ +----------------------------------------------------+    |
+ |                    PREPARED(*1)                    |    | (*2)
+ +----------------------------------------------------+    |
+           |                               |               |
+           v                               v               |
+ +--------------------+          +--------------------+    |
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |<---+
+ +--------------------+          +--------------------+
+
+(*1) Recovered FdwXact entries starts with PREPARED
+(*2) Paths when an error occurrs during preparing
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..fdc6b1f415
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2736 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * Two-phase commit protocol is used when the transaction modified two or
+ * more servers including the local node.  If two-phase commit protocol
+ * is not required all foreign transactions are committed at pre-commit
+ * phase.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit.  The foreign servers are
+ * registered if FDW has both CommitForeignTransaction API and
+ * RollbackForeignTransaction API.  Registered participant servers are
+ * identified by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * all foreign servers.  And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask to commit, we also tell to notify us when
+ * it's done, so that we can wait interruptibly to finish, and so that
+ * we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request.  We have one queue
+ * of which elements are ordered by the timestamp when they expect to be
+ * processed.  Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp when they expects to be processed.
+ * On failure, it enqueues again with new timestamp (last timestamp +
+ * foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction).  Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by
+ * pg_resovle_foreign_xact() function.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.  To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on four linked concepts:
+ *
+ * * All FdwXact fields except for indoubt, inprocessing and status are protected
+ *   by FdwXactLock.  These three fields are protected by its mutex.
+ * * Setting held_by of an FdwXact entry means to own the FdwXact entry, which
+ *   prevent it from updated and removed by concurrent processes.
+ * * The FdwXact whose inprocessing is true is also not processed or removed
+ *   by concurrent processes.
+ * * A process who is going to process foreign transaction needs to hold its
+ *   FdwXact entry in advance.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk.  FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon.  We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallack(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define SeverSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.  This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants may not support transaction callbacks: commit, rollback and
+ * prepare.  If a member of participants doesn't support any transaction
+ * callbacks, i.g. ServerSupportTransactionCallack() returns false,
+ * we don't end its transaction.
+ *
+ * FdwXactParticipants_tmp is used to update FdwXactParticipants atomically
+ * when executing COMMIT/ROLLBACK PREPARED command.  In COMMIT PREPARED case,
+ * we don't want to rollback foreign transactions even if an error occurs,
+ * because the local prepared transaction never turn over rollback in that
+ * case.  However, preparing FdwXactParticipants might be lead an error
+ * because of calling palloc() inside.  So we prepare FdwXactParticipants in
+ * two phase.  In the first phase, PrepareFdwXactParticipants(), we collect
+ * all foreign transactions associated with the local prepared transactions
+ * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
+ * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
+ * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
+ *
+ * FdwXactLocalXid is the local transaction id associated with FdwXactParticipants.
+ */
+static List *FdwXactParticipants = NIL;
+static List *FdwXactParticipants_tmp = NIL;
+static TransactionId FdwXactLocalXid = InvalidTransactionId;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase,
+											 bool for_commit);
+static bool checkForeignTwophaseCommitRequired(void);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static TransactionId FdwXactDetermineTransactionFate(TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						bool hold);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation	rel;
+	Oid			serverid;
+	Oid			userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction. The foreign transaction identified
+ * by given server id and user id.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	pfree(routine);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	return fdw_part;
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and when 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXacts(void)
+{
+	ListCell   *lc;
+	bool		need_twophase_commit;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false if
+	 * foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = checkForeignTwophaseCommitRequired();
+
+	/*
+	 * Prepare foreign transactions on foreign servers that support two-phase
+	 * commit.
+	 */
+	if (need_twophase_commit)
+	{
+		FdwXactPrepareForeignTransactions();
+		ForeignTwophaseCommitIsRequired = true;
+	}
+
+	/*
+	 * Commit other foreign transactions and delete the participant entry from
+	 * the list.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		/*
+		 * Skip already prepared foreign transactions. Note that we keep those
+		 * FdwXactParticipants until the end of the transaction.
+		 */
+		if (fdw_part->fdwxact)
+			continue;
+
+		/* Delete non-transaction-support participants */
+		if (!ServerSupportTransactionCallack(fdw_part))
+		{
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+			continue;
+		}
+
+		/* Commit the foreign transaction in one-phase */
+		FdwXactParticipantEndTransaction(fdw_part, true, true);
+
+		/* Transaction successfully committed delete from the participant list */
+		FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(void)
+{
+	ListCell   *lc;
+	bool		need_twophase_commit;
+	bool		have_notwophase;
+	int			nserverswritten = 0;
+	int			nserverstwophase = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (SeverSupportTwophaseCommit(fdw_part))
+			nserverstwophase++;
+
+		nserverswritten++;
+	}
+
+	/* check if there is a server that doesn't support two-phase commit */
+	have_notwophase = (nserverswritten != nserverstwophase);
+
+	/* Did we modify the local non-temporary data? */
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		nserverswritten++;
+
+	if (nserverswritten <= 1)
+		return false;
+
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+	{
+		/*
+		 * In 'required' case, we require for all modified server to support
+		 * two-phase commit.
+		 */
+		need_twophase_commit = (nserverswritten >= 2);
+	}
+	else
+	{
+		Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
+
+		/*
+		 * In 'prefer' case, we prepare transactions on only servers that
+		 * capable of two-phase commit.
+		 */
+		need_twophase_commit = (nserverstwophase >= 2);
+	}
+
+	/*
+	 * If foreign two phase commit is required then all foreign serves must be
+	 * capable of doing two-phase commit
+	 */
+	if (need_twophase_commit)
+	{
+		/* Parameter check */
+		if (max_prepared_foreign_xacts == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+		if (max_foreign_xact_resolvers == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+		if (have_notwophase &&
+			foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot process a distributed transaction that has operated on a foreign server"),
+					 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+	}
+
+	return need_twophase_commit;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase,
+								 bool for_commit)
+{
+	FdwXactRslvState state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state.xid = FdwXactLocalXid;
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = onephase ? NULL : fdw_part->fdwxact_id;
+	state.flags = onephase ? FDWXACT_FLAG_ONEPHASE : 0;
+
+	if (for_commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Save the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		if (!SeverSupportTwophaseCommit(fdw_part) || !fdw_part->modified)
+			continue;
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, FdwXactLocalXid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(FdwXactLocalXid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.xid = FdwXactLocalXid;
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->inprocessing = false;
+	fdwxact->indoubt = false;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXacts(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	if (!checkForeignTwophaseCommitRequired())
+		return;
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions();
+
+	/*
+	 * Prepared foreign transactions need to be resolved when COMMIT PREPARED
+	 * or ROLLBACK PREPARED.  Therefore we forget all participants here so
+	 * that we don't mark them as in-doubt at the end of the transaction on
+	 * failure.
+	 */
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Collect all foreign transactions associated with the given xid.  Return true
+ * if COMMIT PREPARED or ROLLBACK PREPARED needs to wait for all foreign transactions
+ * to be resolved.  The collected foreign transactions are kept in FdwXactParticipants_tmp,
+ * so the caller must call SetFdwXactParticipants() later if this function returns true.
+ */
+bool
+PrepareFdwXactParticipants(TransactionId xid)
+{
+	MemoryContext old_ctx;
+
+	Assert(FdwXactParticipants_tmp == NIL);
+
+	if (!TwoPhaseExists(xid))
+		return false;
+
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXactParticipant *fdw_part;
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwRoutine *routine;
+
+		if (!fdwxact->valid || fdwxact->local_xid != xid)
+			continue;
+
+		routine = GetFdwRoutineByServerId(fdwxact->serverid);
+		fdw_part = create_fdwxact_participant(fdwxact->serverid, fdwxact->userid,
+											  routine);
+		fdw_part->modified = true;
+		fdw_part->fdwxact = fdwxact;
+
+		/* Add to the participants list */
+		FdwXactParticipants_tmp = lappend(FdwXactParticipants_tmp, fdw_part);
+	}
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_ctx);
+
+	/*
+	 * We cannot proceed to commit this prepared transaction when
+	 * foreign_twophase_commit is disabled.
+	 */
+	if (FdwXactParticipants_tmp != NIL &&
+		!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	return (FdwXactParticipants_tmp != NIL);
+}
+
+/*
+ * Make the collected foreign transactions the participants of this transaction and
+ * hold all of them.  This function must be called after PrepareFdwXactParticipants().
+ */
+void
+SetFdwXactParticipants(TransactionId xid, bool commit)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants_tmp != NIL);
+	Assert(FdwXactParticipants == NIL);
+
+	FdwXactLocalXid = xid;
+	FdwXactParticipants = FdwXactParticipants_tmp;
+	FdwXactParticipants_tmp = NIL;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(SeverSupportTwophaseCommit(fdw_part));
+
+		/* Hold the fdwxact entry and set the status */
+		SpinLockAcquire(&fdw_part->fdwxact->mutex);
+		Assert(fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED);
+		fdw_part->fdwxact->held_by = MyBackendId;
+		fdw_part->fdwxact->status = commit
+			? FDWXACT_STATUS_COMMITTING
+			: FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&fdw_part->fdwxact->mutex);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for its all foreign transactions to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitForResolution(TransactionId wait_xid)
+{
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/*
+	 * Quick exit if either atomic commit is not requested or we don't have
+	 * any participants.
+	 */
+	if (!IsForeignTwophaseCommitRequested() || FdwXactParticipants == NIL)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+				 TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+	bool		found = false;
+
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	/* Initialize variables */
+	*nextResolutionTs_p = -1;
+	*waitXid_p = InvalidTransactionId;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+		{
+			if (proc->fdwXactNextResolutionTs <= now)
+			{
+				/* Found a waiting process */
+				found = true;
+				*waitXid_p = proc->fdwXactWaitXid;
+			}
+			else
+				/* Found a waiting process supposed to be processed later */
+				*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+
+			break;
+		}
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return found ? proc : NULL;
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * In abort case, this function ends foreign transaction participants and possibly
+ * rollback their prepared foreign trasnactions.
+ */
+extern void
+AtEOXact_FdwXacts(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+			FdwXact		fdwxact = fdw_part->fdwxact;
+			int			status;
+
+			if (!fdwxact)
+			{
+				/* Rollback foreign transaction in one-phase if supported */
+				if (ServerSupportTransactionCallack(fdw_part))
+					FdwXactParticipantEndTransaction(fdw_part, true, false);
+				continue;
+			}
+
+			/*
+			 * Abort the foreign transaction.  For participants whose status
+			 * is FDWXACT_STATUS_PREPARING, we close the transaction in
+			 * one-phase. In addition, since we are not sure that the
+			 * preparation has been completed on the foreign server, we also
+			 * attempts to rollback the prepared foreign transaction.  Note
+			 * that it's FDWs responsibility that they tolerate
+			 * OBJECT_NOT_FOUND error in abort case.
+			 */
+			SpinLockAcquire(&fdwxact->mutex);
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&fdwxact->mutex);
+
+			switch (status)
+			{
+				case FDWXACT_STATUS_PREPARING:
+					/* One-phase rollback foreign transaction */
+					FdwXactParticipantEndTransaction(fdw_part, true, false);
+					/* fall through */
+				case FDWXACT_STATUS_PREPARED:
+				case FDWXACT_STATUS_ABORTING:
+					/* One-phase rollback foreign transaction */
+					FdwXactParticipantEndTransaction(fdw_part, false, false);
+					break;
+				case FDWXACT_STATUS_COMMITTING:
+					Assert(false);
+					break;
+			}
+
+			/* Resolution was a success, remove the entry */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (fdwxact->ondisk)
+				RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								  fdwxact->serverid, fdwxact->userid,
+								  true);
+			remove_fdwxact(fdwxact);
+			LWLockRelease(FdwXactLock);
+		}
+
+		/* All foreign transaction should be aborted */
+		list_free(FdwXactParticipants);
+		FdwXactParticipants = NIL;
+	}
+
+	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell   *cell;
+	int			nlefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		Assert(TransactionIdIsValid(FdwXactLocalXid));
+		Assert(fdwxact);
+
+		/*
+		 * Unlock and mark a foreign transaction as in-doubt.  Note that there
+		 * is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.  Also we do these check by
+		 * transaction id because these foreign transaction may already be
+		 * held by the resolver.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->valid && fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+			fdwxact->indoubt = true;	/* let resolver to process */
+			nlefts++;
+		}
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", nlefts);
+		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+	FdwXactParticipants_tmp = NIL;
+	FdwXactLocalXid = InvalidTransactionId;
+}
+
+/*
+ * Resolve foreign transactions at the give indexes. If 'waiter' is not NULL,
+ * we release the waiter after we resolved all of the given foreign transactions
+ * On failure we re-enqueue the waiting backend after incremented the next
+ * resolution time.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		PG_TRY();
+		{
+			FdwXactResolveOneFdwXact(fdwxact);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			if (waiter)
+			{
+				LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+				if (waiter->fdwXactState == FDWXACT_WAITING)
+				{
+					SHMQueueDelete(&(waiter->fdwXactLinks));
+					pg_write_barrier();
+					waiter->fdwXactNextResolutionTs =
+						TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+													foreign_xact_resolution_retry_interval);
+					FdwXactQueueInsert(waiter);
+				}
+				LWLockRelease(FdwXactResolutionLock);
+			}
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	if (!waiter)
+		return;
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter could
+	 * already be detached if user cancelled to wait before resolution.
+	 */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid,
+					  false);
+	LWLockRelease(FdwXactLock);
+
+	return (idx != -1);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Determine whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactDetermineTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted. This should not happen except for one case
+	 * where the local transaction is prepared and this foreign transaction is
+	 * being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/*
+ * Commit or rollback one prepared foreign transaction.  After resolved
+ * successfully, the FdwXact entry is removed from the shared memory and also
+ * remove the corresponding on-disk file.
+ */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->held_by != InvalidBackendId || fdwxact->inprocessing);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactDetermineTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare resolution state to pass to API */
+	state.xid = fdwxact->local_xid;
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respectively.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool hold)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		bool		inprocessing;
+
+		if (!fdwxact->valid)
+			continue;
+
+		SpinLockAcquire(&fdwxact->mutex);
+		inprocessing = fdwxact->inprocessing;
+		SpinLockRelease(&fdwxact->mutex);
+
+		/*
+		 * If we're attempting to hold this entry, skip if it is already held
+		 * or being processed.
+		 */
+		if (hold &&
+			(inprocessing || fdwxact->held_by != InvalidBackendId))
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+
+		if (hold)
+			fdwxact->held_by = MyBackendId;
+
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know the xact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		bool		indoubt;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		indoubt = fdwxact->indoubt;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = BoolGetDatum(indoubt);
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/* Find and hold the FdwXact entry */
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid, true);
+
+	LWLockRelease(FdwXactLock);
+
+	if (idx < 0)
+	{
+		/* No entry */
+		PG_RETURN_BOOL(false);
+	}
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1, NULL);
+	}
+	PG_CATCH();
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[idx];
+
+		SpinLockAcquire(&fdwxact->mutex);
+		FdwXactCtl->fdwxacts[idx]->held_by = InvalidBackendId;
+		SpinLockRelease(&fdwxact->mutex);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			i;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->local_xid == xid && fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+	{
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						fdwxact->serverid)));
+	}
+
+	if (fdwxact->inprocessing || fdwxact->held_by != InvalidBackendId)
+	{
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot remove foreign transaction entry which is being processed")));
+	}
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..fed2fbcd08
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,560 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+	int			i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+	int			i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		bool		indoubt;
+		BackendId	held_by;
+
+		if (!fdwxact->valid)
+			continue;
+
+		SpinLockAcquire(&fdwxact->mutex);
+		indoubt = fdwxact->indoubt;
+		held_by = fdwxact->held_by;
+		SpinLockRelease(&fdwxact->mutex);
+
+		if ((indoubt && held_by == InvalidBackendId) ||
+			(!indoubt && held_by != InvalidBackendId))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..b91a2e1e88
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,436 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int			foreign_xact_resolution_retry_interval;
+int			foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_fdwxacts(PGPROC *waiter);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. We clear that flag from those entries on failure.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* clear inprocessing flags */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->inprocessing = false;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or the queue has
+		 * only waiters that have a future resolution timestamp.
+		 */
+		for (;;)
+		{
+			PGPROC	   *waiter;
+
+			CHECK_FOR_INTERRUPTS();
+
+			LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+
+			waiter = FdwXactGetWaiter(now, &resolutionTs, &waitXid);
+
+			if (!waiter)
+			{
+				/* Not found, break */
+				LWLockRelease(FdwXactResolutionLock);
+				break;
+			}
+
+			/* Hold the waiting foreign transactions */
+			hold_fdwxacts(waiter);
+			Assert(nheld > 0);
+			LWLockRelease(FdwXactResolutionLock);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, waiter);
+			CommitTransactionCommand();
+
+			last_resolution_time = now;
+		}
+
+		/* Hold in-doubt transactions */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, NULL);
+			CommitTransactionCommand();
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		/* There is no waiting backend */
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * FdwXactWaiterExists check.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Mark in-doubt transactions as in-processing.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->held_by == InvalidBackendId && fdwxact->indoubt)
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* hold lock */
+			SpinLockAcquire(&fdwxact->mutex);
+			fdwxact->inprocessing = true;
+			SpinLockRelease(&fdwxact->mutex);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Mark foreign transactions associated with the given waiter's transaction
+ * as in-processing.
+ */
+static void
+hold_fdwxacts(PGPROC *waiter)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == waiter->databaseId &&
+			fdwxact->local_xid == waiter->fdwXactWaitXid)
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* hold lock */
+			SpinLockAcquire(&fdwxact->mutex);
+			Assert(!fdwxact->indoubt);
+			Assert(fdwxact->held_by = waiter->backendId);
+			fdwxact->inprocessing = true;
+			SpinLockRelease(&fdwxact->mutex);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e1904877fa..2b9e039580 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2196,6 +2226,9 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	XLogRecPtr	recptr;
 	TimestampTz committs = GetCurrentTimestamp();
 	bool		replorigin;
+	bool		need_fdwxact_commit;
+
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Are we using the replication origins feature?  Or, in other words, are
@@ -2266,6 +2299,16 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid, true);
+		FdwXactWaitForResolution(xid);
+	}
 }
 
 /*
@@ -2285,6 +2328,9 @@ RecordTransactionAbortPrepared(TransactionId xid,
 							   const char *gid)
 {
 	XLogRecPtr	recptr;
+	bool		need_fdwxact_commit;
+
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Catch the scenario where we aborted partway through
@@ -2325,6 +2371,16 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be rolled back.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid, false);
+		FdwXactWaitForResolution(xid);
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 3984dd3e1a..d89fe5182e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1219,6 +1220,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1227,6 +1229,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1265,12 +1268,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1428,6 +1432,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitForResolution(xid);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2087,6 +2099,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXacts();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2254,6 +2269,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXacts(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2341,6 +2357,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXacts();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2542,6 +2560,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	AtEOXact_FdwXacts(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2751,6 +2770,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXacts(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a53e6d9633..c62d6aa3b7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4599,6 +4600,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6289,6 +6291,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6835,14 +6840,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7044,7 +7050,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7557,6 +7566,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7887,6 +7897,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9182,6 +9195,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9711,8 +9725,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9730,6 +9746,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9746,6 +9763,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9951,6 +9969,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10150,6 +10169,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2bd5f5ea14..0dd403b588 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index ac07f75bc3..e82630eefa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2812,8 +2812,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index f197869752..6206265424 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1419,6 +1433,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1572,6 +1595,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..3fa8bfe09f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -939,7 +940,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..29f376e48c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1ec07bad07..e5dee94764 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -2418,6 +2420,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2258424e81 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f8105c6eb..2abd61b88a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3661,6 +3661,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3873,6 +3879,11 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_FDWXACT:
+			event_name = "FdwXact";
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -4097,6 +4108,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e19d5dc1a6..b804b9ea41 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -973,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..9d34817f39 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 363000670b..6a05070590 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -94,6 +94,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -249,6 +251,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1313,6 +1316,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1378,6 +1382,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1427,6 +1432,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3127,6 +3141,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..adb276370c 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 5aa19d3f78..889dfa7e9a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8958ec8103..5ed6c05b18 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5bdc02fce2..2558c50cef 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -426,6 +427,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -763,6 +783,12 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FDWXACT */
+	gettext_noop("Foreign Transaction Management"),
+	/* FDWXACT_SETTINGS */
+	gettext_noop("Foreign Transaction Management / Settings"),
+	/* FDWXACT_RESOLVER */
+	gettext_noop("Foreign Transaction Management / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2471,6 +2497,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4599,6 +4671,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 995b6ca155..d7ca008a9e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -344,6 +346,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled, prefer or required
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index a66dd078a7..00ca97b96b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e73639df74..3041c39bc0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 233441837f..b040202043 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..bf745cb741
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,162 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER, /* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is being
+								 * committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is being
+								 * aborted */
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	FdwXactStatus status;
+	bool		indoubt;		/* Is an in-doubt transaction? */
+	bool		inprocessing;	/* resolver is processing? */
+	slock_t		mutex;			/* protect above three fields */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	TransactionId xid;
+
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void PreCommit_FdwXacts(void);
+extern void FdwXactReleaseWaiter(PGPROC *waiter);
+extern void FdwXactWaitForResolution(TransactionId wait_xid);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter);
+extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+								TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern bool PrepareFdwXactParticipants(TransactionId xid);
+extern void SetFdwXactParticipants(TransactionId xid, bool commit);
+extern void AtEOXact_FdwXacts(bool is_commit);
+extern void AtPrepare_FdwXacts(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a04fc70326..6f1f336e31 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c8869d5226..da0d442f1b 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4bce3ad8de..ce7e37b29e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5983,6 +5983,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6101,6 +6119,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b8041d9988..d735959ff5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -807,6 +807,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -887,7 +889,9 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_FDWXACT,
+	WAIT_EVENT_FDWXACT_RESOLUTION
 } WaitEventIPC;
 
 /* ----------
@@ -970,6 +974,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index ae4f573ab4..f614908d44 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -154,6 +155,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 454c2df487..6010dbcdee 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,9 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FDWXACT,
+	FDWXACT_SETTINGS,
+	FDWXACT_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 8876025aaa..ddb4c1d3e8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1342,6 +1342,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.23.0

#39Muhammad Usama
m.usama@gmail.com
In reply to: Masahiko Sawada (#38)
1 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, May 12, 2020 at 11:45 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 30 Apr 2020 at 20:43, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 28 Apr 2020 at 19:37, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <

masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com>

wrote:

Hi Sawada San,

I have been further reviewing and testing the transaction involving

multiple server patches.

Overall the patches are working as expected bar a few important

exceptions.

So as discussed over the call I have fixed the issues I found

during the testing

and also rebased the patches with the current head of the master

branch.

So can you please have a look at the attached updated patches.

Thank you for reviewing and updating the patch!

Below is the list of changes I have made on top of V18 patches.

1- In register_fdwxact(), As we are just storing the callback

function pointers from

FdwRoutine in fdw_part structure, So I think we can avoid calling
GetFdwRoutineByServerId() in TopMemoryContext.
So I have moved the MemoryContextSwitch to TopMemoryContext after

the

GetFdwRoutineByServerId() call.

Agreed.

2- If PrepareForeignTransaction functionality is not present in

some FDW then

during the registration process we should only set the

XACT_FLAGS_FDWNOPREPARE

transaction flag if the modified flag is also set for that server.

As for the server that has

not done any data modification within the transaction we do not do

two-phase commit anyway.

Agreed.

3- I have moved the foreign_twophase_commit in sample file after
max_foreign_transaction_resolvers because the default value of

max_foreign_transaction_resolvers

is 0 and enabling the foreign_twophase_commit produces an error

with default

configuration parameter positioning in postgresql.conf
Also, foreign_twophase_commit configuration was missing the comments
about allowed values in the sample config file.

Sounds good. Agreed.

4- Setting ForeignTwophaseCommitIsRequired in

is_foreign_twophase_commit_required()

function does not seem to be the correct place. The reason being,

even when

is_foreign_twophase_commit_required() returns true after setting

ForeignTwophaseCommitIsRequired

to true, we could still end up not using the two-phase commit in

the case when some server does

not support two-phase commit and foreign_twophase_commit is set to

FOREIGN_TWOPHASE_COMMIT_PREFER

mode. So I have moved the ForeignTwophaseCommitIsRequired

assignment to PreCommit_FdwXacts()

function after doing the prepare transaction.

Agreed.

6- In prefer mode, we commit the transaction in single-phase if the

server does not support

the two-phase commit. But instead of doing the single-phase commit

right away,

IMHO the better way is to wait until all the two-phase transactions

are successfully prepared

on servers that support the two-phase. Since an error during a

"PREPARE" stage would

rollback the transaction and in that case, we would end up with

committed transactions on

the server that lacks the support of the two-phase commit.

When an error occurred before the local commit, a 2pc-unsupported
server could be rolled back or committed depending on the error
timing. On the other hand all 2pc-supported servers are always rolled
back when an error occurred before the local commit. Therefore even if
we change the order of COMMIT and PREPARE it is still possible that we
will end up committing the part of 2pc-unsupported servers while
rolling back others including 2pc-supported servers.

I guess the motivation of your change is that since errors are likely
to happen during executing PREPARE on foreign servers, we can minimize
the possibility of rolling back 2pc-unsupported servers by deferring
the commit of 2pc-unsupported server as much as possible. Is that
right?

Yes, that is correct. The idea of doing the COMMIT on

NON-2pc-supported servers

after all the PREPAREs are successful is to minimize the chances of

partial commits.

And as you mentioned there will still be chances of getting a partial

commit even with

this approach but the probability of that would be less than what it

is with the

current sequence.

So I have modified the flow a little bit and instead of doing a

one-phase commit right away

the servers that do not support a two-phase commit is added to

another list and that list is

processed after once we have successfully prepared all the

transactions on two-phase supported

foreign servers. Although this technique is also not bulletproof,

still it is better than doing

the one-phase commits before doing the PREPAREs.

Hmm the current logic seems complex. Maybe we can just reverse the
order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
modified servers first and then do COMMIT on others?

Agreed, seems reasonable.

Also, I think we can improve on this one by throwing an error even

in PREFER

mode if there is more than one server that had data modified within

the transaction

and lacks the two-phase commit support.

IIUC the concept of PREFER mode is that the transaction uses 2pc only
for 2pc-supported servers. IOW, even if the transaction modifies on a
2pc-unsupported server we can proceed with the commit if in PREFER
mode, which cannot if in REQUIRED mode. What is the motivation of your
above idea?

I was thinking that we could change the behavior of PREFER mode such

that we only allow

to COMMIT the transaction if the transaction needs to do a

single-phase commit on one

server only. That way we can ensure that we would never end up with

partial commit.

I think it's good to avoid a partial commit by using your idea but if
we want to avoid a partial commit we can use the 'required' mode,
which requires all participant servers to support 2pc. We throw an
error if participant servers include even one 2pc-unsupported server
is modified within the transaction. Of course if the participant node
is only one 2pc-unsupported server it can use 1pc even in the
'required' mode.

One Idea in this regards would be to switch the local transaction to

commit using 2pc

if there is a total of only one foreign server that does not support

the 2pc in the transaction,

ensuring that 1-pc commit servers should always be less than or equal

to 1. and if there are more

than one foreign server requires 1-pc then we just throw an error.

I might be missing your point but I suppose this idea is to do
something like the following?

1. prepare the local transaction
2. commit the foreign transaction on 2pc-unsupported server
3. commit the prepared local transaction

However having said that, I am not 100% sure if its a good or an

acceptable Idea, and

I am okay with continuing with the current behavior of PREFER mode if

we put it in the

document that this mode can cause a partial commit.

There will three types of servers: (a) a server doesn't support any
transaction API, (b) a server supports only commit and rollback API
and (c) a server supports all APIs (commit, rollback and prepare).
Currently postgres transaction manager manages only server-(b) and
server-(c), adds them to FdwXactParticipants. I'm considering changing
the code so that it adds also server-(a) to FdwXactParticipants, in
order to track the number of server-(a) involved in the transaction.
But it doesn't insert FdwXact entry for it, and manage transactions on
these servers.

The reason is this; if we want to have the 'required' mode strictly
require all participant servers to support 2pc, we should use 2pc when
(# of server-(a) + # of server-(b) + # of server-(c)) >= 2. But since
currently we just track the modification on a server-(a) by a flag we
cannot handle the case where two server-(a) are modified in the
transaction. On the other hand, if we don't consider server-(a) the
transaction could end up with a partial commit when a server-(a)
participates in the transaction. Therefore I'm thinking of the above
change so that the transaction manager can ensure that a partial
commit doesn't happen in the 'required' mode. What do you think?

7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to

reclaim the

memory if fdw_part is removed from the list

I think at the end of the transaction we free entries of
FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
need to do that in PreCommit_FdwXacts()?

Correct me if I am wrong, The fdw_part structures are created in

TopMemoryContext

and if that fdw_part structure is removed from the list at pre_commit

stage

(because we did 1-PC COMMIT on it) then it would leak memory.

The fdw_part structures are created in TopTransactionContext so these
are freed at the end of the transaction.

8- The function FdwXactWaitToBeResolved() was bailing out as soon

as it finds

(FdwXactParticipants == NIL). The problem with that was in the case

of

"COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL

and

effectively the foreign prepared transactions(if any) associated

with locally

prepared transactions were never getting resolved automatically.

postgres=# BEGIN;
BEGIN
INSERT INTO test_local VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
INSERT 0 1
INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'local_prepared';
PREPARE TRANSACTION

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt |

identifier

-------+-----+----------+--------+----------+----------+----------------------------

12929 | 515 | 16389 | 10 | prepared | f |

fx_1339567411_515_16389_10

12929 | 515 | 16391 | 10 | prepared | f |

fx_1963224020_515_16391_10

(2 rows)

-- Now commit the prepared transaction

postgres=# COMMIT PREPARED 'local_prepared';

COMMIT PREPARED

--Foreign prepared transactions associated with 'local_prepared'

not resolved

postgres=#

postgres=# select * from pg_foreign_xacts ;
dbid | xid | serverid | userid | status | in_doubt |

identifier

-------+-----+----------+--------+----------+----------+----------------------------

12929 | 515 | 16389 | 10 | prepared | f |

fx_1339567411_515_16389_10

12929 | 515 | 16391 | 10 | prepared | f |

fx_1963224020_515_16391_10

(2 rows)

So to fix this in case of the two-phase transaction, the function

checks the existence

of associated foreign prepared transactions before bailing out.

Good catch. But looking at your change, we should not accept the case
where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
false.

if (FdwXactParticipants == NIL)
{
/*
* If we are here because of COMMIT/ROLLBACK PREPARED

then the

* FdwXactParticipants list would be empty. So we need

to

* see if there are any foreign prepared transactions

exists

* for this prepared transaction
*/
if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;

foreign_trans = get_fdwxacts(MyDatabaseId,
wait_xid, InvalidOid, InvalidOid,
false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
}

Sorry my bad, its a mistake on my part. we should just return from the

function when

FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false.

if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;
foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid,

InvalidOid, InvalidOid,

false, false, true);

if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
else
return;

9- In function XlogReadFdwXactData() XLogBeginRead call was missing

before XLogReadRecord()

that was causing the crash during recovery.

Agreed.

10- incorporated set_ps_display() signature change.

Thanks.

Regarding other changes you did in v19 patch, I have some comments:

1.
+       ereport(LOG,
+                       (errmsg("trying to %s the foreign transaction
associated with transaction %u on server %u",
+                                       fdwxact->status ==
FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
+                                       fdwxact->local_xid,
fdwxact->serverid)));
+

Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL

function?

That change was not intended to get into the patch file. I had done it

during testing to

quickly get info on which way the transaction is going to be resolved.

2.
diff --git a/src/bin/pg_waldump/fdwxactdesc.c

b/src/bin/pg_waldump/fdwxactdesc.c

deleted file mode 120000
index ce8c21880c..0000000000
--- a/src/bin/pg_waldump/fdwxactdesc.c
+++ /dev/null
@@ -1 +0,0 @@
-../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/fdwxactdesc.c

b/src/bin/pg_waldump/fdwxactdesc.c

new file mode 100644
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c

We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.

Again sorry! that was an oversight on my part.

3.
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1526,14 +1526,14 @@ postgres   27093  0.0  0.0  30096  2752 ?
Ss   11:34   0:00 postgres: ser
<entry><literal>SafeSnapshot</literal></entry>
<entry>Waiting for a snapshot for a <literal>READ ONLY
DEFERRABLE</literal> transaction.</entry>
</row>
-        <row>
-         <entry><literal>SyncRep</literal></entry>
-         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
-        </row>
<row>
<entry><literal>FdwXactResolution</literal></entry>
<entry>Waiting for all foreign transaction participants to
be resolved during atomic commit among foreign servers.</entry>
</row>
+        <row>
+         <entry><literal>SyncRep</literal></entry>
+         <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
+        </row>
<row>
<entry morerows="4"><literal>Timeout</literal></entry>
<entry><literal>BaseBackupThrottle</literal></entry>

We need to move the entry of FdwXactResolution to right before
Hash/Batch/Allocating for alphabetical order.

Agreed!

I've incorporated your changes I agreed with to my local branch and
will incorporate other changes after discussion. I'll also do more
test and self-review and will submit the latest version patch.

Meanwhile, I found a couple of more small issues, One is the break

statement missing

i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers()
could return un-initialized value.
I am attaching a small patch for these changes that can be applied on

top of existing

patches.

Thank you for the patch!

I'm updating the patches because current behavior in error case would
not be good. For example, when an error occurs in the prepare phase,
prepared transactions are left as in-doubt transaction. And these
transactions are not handled by the resolver process. That means that
a user could need to resolve these transactions manually every abort
time, which is not good. In abort case, I think that prepared
transactions can be resolved by the backend itself, rather than
leaving them for the resolver. I'll submit the updated patch.

I've attached the latest version patch set which includes some changes
from the previous version:

* I've added regression tests that test all types of FDW
implementations. There are three types of FDW: FDW doesn't support any
transaction APIs, FDW supports only commit and rollback APIs and FDW
supports all (prepare, commit and rollback) APISs.
src/test/module/test_fdwxact contains those FDW implementations for
tests, and test some cases where a transaction reads/writes data on
various types of foreign servers.
* Also test_fdwxact has TAP tests that check failure cases. The test
FDW implementation has the ability to inject error or panic into
prepare or commit phase. Using it the TAP test checks if distributed
transactions can be committed or rolled back even in failure cases.
* When foreign_twophase_commit = 'required', the transaction commit
fails if the transaction modified data on even one server not
supporting prepare API. Previously, we used to ignore servers that
don't support any transaction API but we check them to strictly
require all involved foreign servers to support all transaction APIs.
* Transaction resolver process resolves in-doubt transactions
automatically.
* Incorporated comments from Muhammad Usama.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server regardless
of foreign_twophase_commit setting, and throw an error otherwise when
2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should also prepare
the transaction on the foreign server even if the transaction has modified
only
one foreign table.

What do you think?

Also without this change, the above test case produces an assertion failure
with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

I am attaching a patch that I have generated on top of your V20
patches with these two modifications along with the related test case.

Best regards!
--
...
Muhammad Usama
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC

Attachments:

v20_gtm_fixes.diffapplication/octet-stream; name=v20_gtm_fixes.diffDownload
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index fdc6b1f415..a0ac22d9eb 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -232,7 +232,7 @@ static bool fdwXactExitRegistered = false;
 static void register_fdwxact(Oid serverid, Oid userid, bool modified);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase,
 											 bool for_commit);
-static bool checkForeignTwophaseCommitRequired(void);
+static bool checkForeignTwophaseCommitRequired(bool for_prepare);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
 							  Oid userid, Oid umid, char *fdwxact_id);
@@ -464,7 +464,7 @@ PreCommit_FdwXacts(void)
 	 * Check if we need to use foreign twophase commit. It's always false if
 	 * foreign twophase commit is disabled.
 	 */
-	need_twophase_commit = checkForeignTwophaseCommitRequired();
+	need_twophase_commit = checkForeignTwophaseCommitRequired(false);
 
 	/*
 	 * Prepare foreign transactions on foreign servers that support two-phase
@@ -511,10 +511,10 @@ PreCommit_FdwXacts(void)
  * in FdwXactParticipants and local server itself.
  */
 static bool
-checkForeignTwophaseCommitRequired(void)
+checkForeignTwophaseCommitRequired(bool for_prepare)
 {
 	ListCell   *lc;
-	bool		need_twophase_commit;
+	bool		need_twophase_commit = false;
 	bool		have_notwophase;
 	int			nserverswritten = 0;
 	int			nserverstwophase = 0;
@@ -538,32 +538,43 @@ checkForeignTwophaseCommitRequired(void)
 	/* check if there is a server that doesn't support two-phase commit */
 	have_notwophase = (nserverswritten != nserverstwophase);
 
-	/* Did we modify the local non-temporary data? */
-	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
-		nserverswritten++;
-
-	if (nserverswritten <= 1)
-		return false;
-
-	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+	if (for_prepare)
 	{
 		/*
-		 * In 'required' case, we require for all modified server to support
-		 * two-phase commit.
+		 * In case of PREPARE TRANSACTION we must use 2PC even when
+		 * only one foreign server is modified
 		 */
-		need_twophase_commit = (nserverswritten >= 2);
+		need_twophase_commit = (nserverswritten >= 1);
 	}
-	else
+
+	/* Did we modify the local non-temporary data? */
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		nserverswritten++;
+
+	if (!for_prepare)
 	{
-		Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
+		if (nserverswritten <= 1)
+			return false;
 
-		/*
-		 * In 'prefer' case, we prepare transactions on only servers that
-		 * capable of two-phase commit.
-		 */
-		need_twophase_commit = (nserverstwophase >= 2);
-	}
+		if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+		{
+			/*
+			 * In 'required' case, we require for all modified server to support
+			 * two-phase commit.
+			 */
+			need_twophase_commit = (nserverswritten >= 2);
+		}
+		else
+		{
+			Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
 
+			/*
+			 * In 'prefer' case, we prepare transactions on only servers that
+			 * capable of two-phase commit.
+			 */
+			need_twophase_commit = (nserverstwophase >= 1);
+		}
+	}
 	/*
 	 * If foreign two phase commit is required then all foreign serves must be
 	 * capable of doing two-phase commit
@@ -589,6 +600,12 @@ checkForeignTwophaseCommitRequired(void)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot process a distributed transaction that has operated on a foreign server"),
 					 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+		if (have_notwophase && for_prepare)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot process a distributed transaction that has operated on a foreign server"),
+					 errdetail("PREPARE TRANSACTION requires all foreign servers to support two-phase commit")));
 	}
 
 	return need_twophase_commit;
@@ -1001,7 +1018,7 @@ AtPrepare_FdwXacts(void)
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
 
-	if (!checkForeignTwophaseCommitRequired())
+	if (!checkForeignTwophaseCommitRequired(true))
 		return;
 
 	/* Prepare transactions on participating foreign servers. */
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
index a5c8b89655..97f5052928 100644
--- a/src/test/modules/test_fdwxact/expected/test_fdwxact.out
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -186,3 +186,41 @@ BEGIN;
 INSERT INTO t VALUES (1);
 INSERT INTO ft_no2pc_1 VALUES (1);
 COMMIT;
+SELECT count(*)  FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+--ERROR case
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot process a distributed transaction that has operated on a foreign server
+DETAIL:  PREPARE TRANSACTION requires all foreign servers to support two-phase commit
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO t VALUES (1);
+COMMIT /*should do 2PC*/;
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
index 554312542f..271f651baf 100644
--- a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -160,10 +160,12 @@ BEGIN;
 INSERT INTO ft_1 VALUES (1);
 INSERT INTO ft_2pc_1 VALUES (1);
 COMMIT;
+
 BEGIN;
 INSERT INTO ft_no2pc_1 VALUES (1);
 INSERT INTO ft_2pc_1 VALUES (1);
 COMMIT;
+
 BEGIN;
 INSERT INTO ft_1 VALUES (1);
 INSERT INTO ft_2 VALUES (1);
@@ -176,3 +178,29 @@ BEGIN;
 INSERT INTO t VALUES (1);
 INSERT INTO ft_no2pc_1 VALUES (1);
 COMMIT;
+
+SELECT count(*)  FROM pg_foreign_xacts();
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+
+--ERROR case
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO t VALUES (1);
+COMMIT /*should do 2PC*/;
#40Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#39)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

Thank you for reviewing!

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server regardless
of foreign_twophase_commit setting, and throw an error otherwise when
2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should also prepare
the transaction on the foreign server even if the transaction has modified only
one foreign table.

What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

Also without this change, the above test case produces an assertion failure
with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#41Muhammad Usama
m.usama@gmail.com
In reply to: Masahiko Sawada (#40)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

Thank you for reviewing!

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server regardless
of foreign_twophase_commit setting, and throw an error otherwise when
2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should also

prepare

the transaction on the foreign server even if the transaction has

modified only

one foreign table.

What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

Also without this change, the above test case produces an assertion

failure

with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

I think you are confusing between nserverstwophase and nserverswritten.

need_twophase_commit = (nserverstwophase >= 1) would mean
use two-phase commit if at least one server exists in the list that is
capable of doing 2PC

For the case when the transaction modified data on only one server we
already exits the function indicating no two-phase required

if (nserverswritten <= 1)
return false;

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Regards,
...
Muhammad Usama
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC

#42Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#41)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

Thank you for reviewing!

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server regardless
of foreign_twophase_commit setting, and throw an error otherwise when
2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should also prepare
the transaction on the foreign server even if the transaction has modified only
one foreign table.

What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

Also without this change, the above test case produces an assertion failure
with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

I think you are confusing between nserverstwophase and nserverswritten.

need_twophase_commit = (nserverstwophase >= 1) would mean
use two-phase commit if at least one server exists in the list that is
capable of doing 2PC

For the case when the transaction modified data on only one server we
already exits the function indicating no two-phase required

if (nserverswritten <= 1)
return false;

Thank you for your explanation. If the transaction modified two
servers that don't' support 2pc and one server that supports 2pc I
think we don't want to use 2pc even in 'prefer' case. Because even if
we use 2pc in that case, it's still possible to have the atomic commit
problem. For example, if we failed to commit a transaction after
committing other transactions on the server that doesn't support 2pc
we cannot rollback the already-committed transaction.

On the other hand, in 'prefer' case, if the transaction also modified
the local data, we need to use 2pc even if it modified data on only
one foreign server that supports 2pc. But the current code doesn't
work fine in that case for now. Probably we also need the following
change:

@@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)

    /* Did we modify the local non-temporary data? */
    if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+   {
        nserverswritten++;
+       nserverstwophase++;
+   }

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#43Muhammad Usama
m.usama@gmail.com
In reply to: Masahiko Sawada (#42)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <

masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

Thank you for reviewing!

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server

regardless

of foreign_twophase_commit setting, and throw an error otherwise when
2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should also

prepare

the transaction on the foreign server even if the transaction has

modified only

one foreign table.

What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

Also without this change, the above test case produces an assertion

failure

with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

I think you are confusing between nserverstwophase and nserverswritten.

need_twophase_commit = (nserverstwophase >= 1) would mean
use two-phase commit if at least one server exists in the list that is
capable of doing 2PC

For the case when the transaction modified data on only one server we
already exits the function indicating no two-phase required

if (nserverswritten <= 1)
return false;

Thank you for your explanation. If the transaction modified two
servers that don't' support 2pc and one server that supports 2pc I
think we don't want to use 2pc even in 'prefer' case. Because even if
we use 2pc in that case, it's still possible to have the atomic commit
problem. For example, if we failed to commit a transaction after
committing other transactions on the server that doesn't support 2pc
we cannot rollback the already-committed transaction.

Yes, that is true, And I think the 'prefer' mode will always have a corner
case
no matter what. But the thing is we can reduce the probability of hitting
an atomic commit problem by ensuring to use 2PC whenever possible.

For instance as in your example scenario where a transaction modified
two servers that don't support 2PC and one server that supports it. let us
analyze both scenarios.

If we use 2PC on the server that supports it then the probability of hitting
a problem would be 1/3 = 0.33. because there is only one corner case
scenario in that case. which would be if we fail to commit the third server
As the first server (2PC supported one) would be using prepared
transactions so no problem there. The second server (NON-2PC support)
if failed to commit then, still no problem as we can rollback the prepared
transaction on the first server. The only issue would happen when we fail
to commit on the third server because we have already committed
on the second server and there is no way to undo that.

Now consider the other possibility if we do not use the 2PC in that
case (as you mentioned), then the probability of hitting the problem
would be 2/3 = 0.66. because now commit failure on either second or
third server will land us in an atomic-commit-problem.

So, INMO using the 2PC whenever available with 'prefer' mode
should be the way to go.

On the other hand, in 'prefer' case, if the transaction also modified
the local data, we need to use 2pc even if it modified data on only
one foreign server that supports 2pc. But the current code doesn't
work fine in that case for now. Probably we also need the following
change:

@@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)

/* Did we modify the local non-temporary data? */
if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+   {
nserverswritten++;
+       nserverstwophase++;
+   }

I agree with the part that if the transaction also modifies the local data
then the 2PC should be used.
Though the change you suggested [+ nserverstwophase++;]
would server the purpose and deliver the same results but I think a
better way would be to change need_twophase_commit condition for
prefer mode.

      * In 'prefer' case, we prepare transactions on only servers that
      * capable of two-phase commit.
      */
-     need_twophase_commit = (nserverstwophase >= 2);
+    need_twophase_commit = (nserverstwophase >= 1);
      }

The reason I am saying that is. Currently, we do not use 2PC on the local
server
in case of distributed transactions, so we should also not count the local
server
as one (servers that would be performing the 2PC).
Also I feel the change need_twophase_commit = (nserverstwophase >= 1)
looks more in line with the definition of our 'prefer' mode algorithm.

Do you see an issue with this change?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Regards,
...
Muhammad Usama
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC

#44Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#43)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 15 May 2020 at 19:06, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

Thank you for reviewing!

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server regardless
of foreign_twophase_commit setting, and throw an error otherwise when
2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should also prepare
the transaction on the foreign server even if the transaction has modified only
one foreign table.

What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

Also without this change, the above test case produces an assertion failure
with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

I think you are confusing between nserverstwophase and nserverswritten.

need_twophase_commit = (nserverstwophase >= 1) would mean
use two-phase commit if at least one server exists in the list that is
capable of doing 2PC

For the case when the transaction modified data on only one server we
already exits the function indicating no two-phase required

if (nserverswritten <= 1)
return false;

Thank you for your explanation. If the transaction modified two
servers that don't' support 2pc and one server that supports 2pc I
think we don't want to use 2pc even in 'prefer' case. Because even if
we use 2pc in that case, it's still possible to have the atomic commit
problem. For example, if we failed to commit a transaction after
committing other transactions on the server that doesn't support 2pc
we cannot rollback the already-committed transaction.

Yes, that is true, And I think the 'prefer' mode will always have a corner case
no matter what. But the thing is we can reduce the probability of hitting
an atomic commit problem by ensuring to use 2PC whenever possible.

For instance as in your example scenario where a transaction modified
two servers that don't support 2PC and one server that supports it. let us
analyze both scenarios.

If we use 2PC on the server that supports it then the probability of hitting
a problem would be 1/3 = 0.33. because there is only one corner case
scenario in that case. which would be if we fail to commit the third server
As the first server (2PC supported one) would be using prepared
transactions so no problem there. The second server (NON-2PC support)
if failed to commit then, still no problem as we can rollback the prepared
transaction on the first server. The only issue would happen when we fail
to commit on the third server because we have already committed
on the second server and there is no way to undo that.

Now consider the other possibility if we do not use the 2PC in that
case (as you mentioned), then the probability of hitting the problem
would be 2/3 = 0.66. because now commit failure on either second or
third server will land us in an atomic-commit-problem.

So, INMO using the 2PC whenever available with 'prefer' mode
should be the way to go.

My understanding of 'prefer' mode is that even if a distributed
transaction modified data on several types of server we can ensure to
keep data consistent among only the local server and foreign servers
that support 2pc. It doesn't ensure anything for other servers that
don't support 2pc. Therefore we use 2pc if the transaction modifies
data on two or more servers that either the local node or servers that
support 2pc.

I understand your argument that using 2pc in that case the possibility
of hitting a problem can decrease but one point we need to consider is
2pc is very high cost. I think basically most users don’t want to use
2pc as much as possible. Please note that it might not work as the
user expected because users cannot specify the commit order and
particular servers might be unstable. I'm not sure that users want to
pay high costs under such conditions. If we want to decrease that
possibility by using 2pc as much as possible, I think it can be yet
another mode so that the user can choose the trade-off.

On the other hand, in 'prefer' case, if the transaction also modified
the local data, we need to use 2pc even if it modified data on only
one foreign server that supports 2pc. But the current code doesn't
work fine in that case for now. Probably we also need the following
change:

@@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)

/* Did we modify the local non-temporary data? */
if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+   {
nserverswritten++;
+       nserverstwophase++;
+   }

I agree with the part that if the transaction also modifies the local data
then the 2PC should be used.
Though the change you suggested [+ nserverstwophase++;]
would server the purpose and deliver the same results but I think a
better way would be to change need_twophase_commit condition for
prefer mode.

* In 'prefer' case, we prepare transactions on only servers that
* capable of two-phase commit.
*/
-     need_twophase_commit = (nserverstwophase >= 2);
+    need_twophase_commit = (nserverstwophase >= 1);
}

The reason I am saying that is. Currently, we do not use 2PC on the local server
in case of distributed transactions, so we should also not count the local server
as one (servers that would be performing the 2PC).
Also I feel the change need_twophase_commit = (nserverstwophase >= 1)
looks more in line with the definition of our 'prefer' mode algorithm.

Do you see an issue with this change?

I think that with my change we will use 2pc in the case where a
transaction modified data on the local node and one server that
supports 2pc. But with your change, we will use 2pc in more cases, in
addition to the case where a transaction modifies the local and one
2pc-support server. This would fit the definition of 'prefer' you
described but it's still unclear to me that it's better to make
'prefer' mode behave so if we have three values: 'required', 'prefer'
and 'disabled'.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#45Muhammad Usama
m.usama@gmail.com
In reply to: Masahiko Sawada (#44)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, May 15, 2020 at 7:52 PM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 19:06, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <

masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <

masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com>

wrote:

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

Thank you for reviewing!

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server

regardless

of foreign_twophase_commit setting, and throw an error otherwise

when

2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should

also prepare

the transaction on the foreign server even if the transaction has

modified only

one foreign table.

What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

Also without this change, the above test case produces an

assertion failure

with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

I think you are confusing between nserverstwophase and

nserverswritten.

need_twophase_commit = (nserverstwophase >= 1) would mean
use two-phase commit if at least one server exists in the list that is
capable of doing 2PC

For the case when the transaction modified data on only one server we
already exits the function indicating no two-phase required

if (nserverswritten <= 1)
return false;

Thank you for your explanation. If the transaction modified two
servers that don't' support 2pc and one server that supports 2pc I
think we don't want to use 2pc even in 'prefer' case. Because even if
we use 2pc in that case, it's still possible to have the atomic commit
problem. For example, if we failed to commit a transaction after
committing other transactions on the server that doesn't support 2pc
we cannot rollback the already-committed transaction.

Yes, that is true, And I think the 'prefer' mode will always have a

corner case

no matter what. But the thing is we can reduce the probability of hitting
an atomic commit problem by ensuring to use 2PC whenever possible.

For instance as in your example scenario where a transaction modified
two servers that don't support 2PC and one server that supports it. let

us

analyze both scenarios.

If we use 2PC on the server that supports it then the probability of

hitting

a problem would be 1/3 = 0.33. because there is only one corner case
scenario in that case. which would be if we fail to commit the third

server

As the first server (2PC supported one) would be using prepared
transactions so no problem there. The second server (NON-2PC support)
if failed to commit then, still no problem as we can rollback the

prepared

transaction on the first server. The only issue would happen when we fail
to commit on the third server because we have already committed
on the second server and there is no way to undo that.

Now consider the other possibility if we do not use the 2PC in that
case (as you mentioned), then the probability of hitting the problem
would be 2/3 = 0.66. because now commit failure on either second or
third server will land us in an atomic-commit-problem.

So, INMO using the 2PC whenever available with 'prefer' mode
should be the way to go.

My understanding of 'prefer' mode is that even if a distributed
transaction modified data on several types of server we can ensure to
keep data consistent among only the local server and foreign servers
that support 2pc. It doesn't ensure anything for other servers that
don't support 2pc. Therefore we use 2pc if the transaction modifies
data on two or more servers that either the local node or servers that
support 2pc.

I understand your argument that using 2pc in that case the possibility
of hitting a problem can decrease but one point we need to consider is
2pc is very high cost. I think basically most users don’t want to use
2pc as much as possible. Please note that it might not work as the
user expected because users cannot specify the commit order and
particular servers might be unstable. I'm not sure that users want to
pay high costs under such conditions. If we want to decrease that
possibility by using 2pc as much as possible, I think it can be yet
another mode so that the user can choose the trade-off.

On the other hand, in 'prefer' case, if the transaction also modified
the local data, we need to use 2pc even if it modified data on only
one foreign server that supports 2pc. But the current code doesn't
work fine in that case for now. Probably we also need the following
change:

@@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)

/* Did we modify the local non-temporary data? */
if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+   {
nserverswritten++;
+       nserverstwophase++;
+   }

I agree with the part that if the transaction also modifies the local

data

then the 2PC should be used.
Though the change you suggested [+ nserverstwophase++;]
would server the purpose and deliver the same results but I think a
better way would be to change need_twophase_commit condition for
prefer mode.

* In 'prefer' case, we prepare transactions on only servers that
* capable of two-phase commit.
*/
-     need_twophase_commit = (nserverstwophase >= 2);
+    need_twophase_commit = (nserverstwophase >= 1);
}

The reason I am saying that is. Currently, we do not use 2PC on the

local server

in case of distributed transactions, so we should also not count the

local server

as one (servers that would be performing the 2PC).
Also I feel the change need_twophase_commit = (nserverstwophase >= 1)
looks more in line with the definition of our 'prefer' mode algorithm.

Do you see an issue with this change?

I think that with my change we will use 2pc in the case where a
transaction modified data on the local node and one server that
supports 2pc. But with your change, we will use 2pc in more cases, in
addition to the case where a transaction modifies the local and one
2pc-support server. This would fit the definition of 'prefer' you
described but it's still unclear to me that it's better to make
'prefer' mode behave so if we have three values: 'required', 'prefer'
and 'disabled'.

Thanks for the detailed explanation, now I have a better understanding of
the
reasons why we were going for a different solution to the problem.
You are right my understanding of 'prefer' mode is we must use 2PC as much
as possible, and reason for that was the world prefer as per my
understanding
means "it's more desirable/better to use than another or others"
So the way I understood the FOREIGN_TWOPHASE_COMMIT_PREFER
was that we would use 2PC in the maximum possible of cases, and the user
would already have the expectation that 2PC is more expensive than 1PC.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Regards,
...
Muhammad Usama
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC

#46Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#45)
5 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sat, 16 May 2020 at 00:54, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 7:52 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 19:06, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote:

On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:

Hi Sawada,

I have just done some review and testing of the patches and have
a couple of comments.

Thank you for reviewing!

1- IMHO the PREPARE TRANSACTION should always use 2PC even
when the transaction has operated on a single foreign server regardless
of foreign_twophase_commit setting, and throw an error otherwise when
2PC is not available on any of the data-modified servers.

For example, consider the case

BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';

Here since we are preparing the local transaction so we should also prepare
the transaction on the foreign server even if the transaction has modified only
one foreign table.

What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

Also without this change, the above test case produces an assertion failure
with your patches.

2- when deciding if the two-phase commit is required or not in
FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
2PC when we have at least one server capable of doing that.

i.e

For FOREIGN_TWOPHASE_COMMIT_PREFER case in
checkForeignTwophaseCommitRequired() function I think
the condition should be

need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

I think you are confusing between nserverstwophase and nserverswritten.

need_twophase_commit = (nserverstwophase >= 1) would mean
use two-phase commit if at least one server exists in the list that is
capable of doing 2PC

For the case when the transaction modified data on only one server we
already exits the function indicating no two-phase required

if (nserverswritten <= 1)
return false;

Thank you for your explanation. If the transaction modified two
servers that don't' support 2pc and one server that supports 2pc I
think we don't want to use 2pc even in 'prefer' case. Because even if
we use 2pc in that case, it's still possible to have the atomic commit
problem. For example, if we failed to commit a transaction after
committing other transactions on the server that doesn't support 2pc
we cannot rollback the already-committed transaction.

Yes, that is true, And I think the 'prefer' mode will always have a corner case
no matter what. But the thing is we can reduce the probability of hitting
an atomic commit problem by ensuring to use 2PC whenever possible.

For instance as in your example scenario where a transaction modified
two servers that don't support 2PC and one server that supports it. let us
analyze both scenarios.

If we use 2PC on the server that supports it then the probability of hitting
a problem would be 1/3 = 0.33. because there is only one corner case
scenario in that case. which would be if we fail to commit the third server
As the first server (2PC supported one) would be using prepared
transactions so no problem there. The second server (NON-2PC support)
if failed to commit then, still no problem as we can rollback the prepared
transaction on the first server. The only issue would happen when we fail
to commit on the third server because we have already committed
on the second server and there is no way to undo that.

Now consider the other possibility if we do not use the 2PC in that
case (as you mentioned), then the probability of hitting the problem
would be 2/3 = 0.66. because now commit failure on either second or
third server will land us in an atomic-commit-problem.

So, INMO using the 2PC whenever available with 'prefer' mode
should be the way to go.

My understanding of 'prefer' mode is that even if a distributed
transaction modified data on several types of server we can ensure to
keep data consistent among only the local server and foreign servers
that support 2pc. It doesn't ensure anything for other servers that
don't support 2pc. Therefore we use 2pc if the transaction modifies
data on two or more servers that either the local node or servers that
support 2pc.

I understand your argument that using 2pc in that case the possibility
of hitting a problem can decrease but one point we need to consider is
2pc is very high cost. I think basically most users don’t want to use
2pc as much as possible. Please note that it might not work as the
user expected because users cannot specify the commit order and
particular servers might be unstable. I'm not sure that users want to
pay high costs under such conditions. If we want to decrease that
possibility by using 2pc as much as possible, I think it can be yet
another mode so that the user can choose the trade-off.

On the other hand, in 'prefer' case, if the transaction also modified
the local data, we need to use 2pc even if it modified data on only
one foreign server that supports 2pc. But the current code doesn't
work fine in that case for now. Probably we also need the following
change:

@@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)

/* Did we modify the local non-temporary data? */
if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+   {
nserverswritten++;
+       nserverstwophase++;
+   }

I agree with the part that if the transaction also modifies the local data
then the 2PC should be used.
Though the change you suggested [+ nserverstwophase++;]
would server the purpose and deliver the same results but I think a
better way would be to change need_twophase_commit condition for
prefer mode.

* In 'prefer' case, we prepare transactions on only servers that
* capable of two-phase commit.
*/
-     need_twophase_commit = (nserverstwophase >= 2);
+    need_twophase_commit = (nserverstwophase >= 1);
}

The reason I am saying that is. Currently, we do not use 2PC on the local server
in case of distributed transactions, so we should also not count the local server
as one (servers that would be performing the 2PC).
Also I feel the change need_twophase_commit = (nserverstwophase >= 1)
looks more in line with the definition of our 'prefer' mode algorithm.

Do you see an issue with this change?

I think that with my change we will use 2pc in the case where a
transaction modified data on the local node and one server that
supports 2pc. But with your change, we will use 2pc in more cases, in
addition to the case where a transaction modifies the local and one
2pc-support server. This would fit the definition of 'prefer' you
described but it's still unclear to me that it's better to make
'prefer' mode behave so if we have three values: 'required', 'prefer'
and 'disabled'.

Thanks for the detailed explanation, now I have a better understanding of the
reasons why we were going for a different solution to the problem.
You are right my understanding of 'prefer' mode is we must use 2PC as much
as possible, and reason for that was the world prefer as per my understanding
means "it's more desirable/better to use than another or others"
So the way I understood the FOREIGN_TWOPHASE_COMMIT_PREFER
was that we would use 2PC in the maximum possible of cases, and the user
would already have the expectation that 2PC is more expensive than 1PC.

I think that the current three values are useful for users. The
‘required’ mode is used when users want to ensure all writes involved
with the transaction are committed atomically. That being said, as
some FDW plugin might not support the prepare API we cannot force
users to use this mode all the time when using atomic commit.
Therefore ‘prefer’ mode would be useful for this case. Both modes use
2pc only when it's required for atomic commit.

So what do you think my idea that adding the behavior you proposed as
another new mode? As it’s better to keep the first version simple as
much as possible It might not be added to the first version but this
behavior might be useful in some cases.

I've attached a new version patch that incorporates some bug fixes
reported by Muhammad. Please review them.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v21-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v21-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 947ad43e20c296a613131b2a3136956709749336 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 14:12:17 +0500
Subject: [PATCH v21 1/5] Keep track of writing on non-temporary relation

---
 src/backend/executor/nodeModifyTable.c | 16 ++++++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 22 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..1ec07bad07 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -581,6 +581,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -619,6 +623,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -970,6 +978,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1482,6 +1494,10 @@ lreplace:;
 	if (canSetTag)
 		(estate->es_processed)++;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/* AFTER ROW UPDATE Triggers */
 	ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, slot,
 						 recheckIndexes,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7ee04babc2..a04fc70326 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.23.0

v21-0003-Documentation-update.patchapplication/octet-stream; name=v21-0003-Documentation-update.patchDownload
From cd028035abd062c24bd3a9a719fbf4bf53733eb0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v21 3/5] Documentation update.

---
 doc/src/sgml/catalogs.sgml                | 145 +++++++++++++
 doc/src/sgml/config.sgml                  | 146 +++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 154 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 ++++++++
 doc/src/sgml/monitoring.sgml              |  86 ++++++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 864 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b1b077c97f..bdc60908be 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9216,6 +9216,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>open cursors</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry>
       <entry>summary of configuration file contents</entry>
@@ -10931,6 +10936,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on that the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>resolved</literal> : This foreign transaction has been resolved.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-doubt status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_prepared_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9f2a4a2470..feef45e18c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9081,6 +9081,152 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal>, <literal>prefer</literal> and
+         <literal>disabled</literal>. The default setting is
+         <literal>disabled</literal>. Setting to <literal>disabled</literal>
+         don't use two-phase commit protocol to commit or rollback distributed
+         transactions. When set to <literal>required</literal> distributed
+         transactions strictly requires that all written servers can use
+         two-phase commit protocol.  That is, the distributed transaction cannot
+         commit if even one server does not support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. In <literal>prefer</literal> and <literal>required</literal> case,
+         distributed transaction commit will wait for all involving foreign
+         transaction to be committed before the command return a "success"
+         indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> or <literal>prefer</literal> there
+          can be risk of database consistency among all servers that involved in
+          the distributed transaction when some foreign server crashes during
+          committing the distributed transaction.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If <literal>N</literal> local transactions each
+         across <literal>K</literal> foreign server this value need to be set
+         <literal>N * K</literal>, not just <literal>N</literal>.
+         This parameter can only be set at server start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..85d8e8e9e4
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,154 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were not simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).
+    A <productname>PostgreSQL</productname> server that received SQL is called
+    <firstterm>coordinator node</firstterm> who is responsible for coordinating
+    all the participanting transactions. Using two-phase commit protocol, the commit
+    sequence of distributed transaction performs with the following steps.
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers.
+      </para>
+     </listitem>
+    </orderedlist>
+
+   </para>
+
+   <para>
+    At the first step, <productname>PostgreSQL</productname> distributed
+    transaction manager prepares all transaction on the foreign servers if
+    two-phase commit is required. Two-phase commit is required when the
+    transaction modifies data on two or more servers including the local server
+    itself and <xref linkend="guc-foreign-twophase-commit"/>is
+    <literal>required</literal> or <literal>prefer</literal>. If all preparations
+    on foreign servers got successful go to the next step. Any failure happens
+    in this step <productname>PostgreSQL</productname> changes to rollback, then
+    rollback all transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the local commit step, <productname>PostgreSQL</productname> commit the
+    transaction locally. Any failure happens in this step
+    <productname>PostgreSQL</productname> changes rollback, then rollback all
+    transactions on both local and foreign servers.
+   </para>
+
+   <para>
+    At the final step, prepared transactions are resolved by a foreign transaction
+    resolver process.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>Manual Resolution of In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, distributed transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the coordinator node crashed during either preparing or
+    resolving distributed transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/>
+    view. These foreign transactions are resolved by foreign transaction resolver
+    process or executing <function>pg_resolve_foriegn_xact</function> function
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolution">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that is
+    responsible for resolving both foreign transactions that are prepared by
+    online transactions and in-doubt transactions. They commit or rollback
+    prepared transaction on foreign servers if the coordinator received agreement
+    messages from all foreign servers during the first step.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on one database connecting to. On failure during resolution, they retry to
+    resolve at an interval of <varname>foreign_transaction_resolution_interval</varname>
+    time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped. So to drop the database, you can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+   </para>
+
+   <para>
+    On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that some extensions and parallel queries also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..dd0358ef22 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 68179f71cd..1ab8e80fdc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c06afd3ea..e281bd33d8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26126,6 +26126,95 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index acc6e2bc31..0df1073e4a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -384,6 +384,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry>
+      <entry>One row per foreign transaction resolver process, showing statistics about
+       foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for
+       details.
+      </entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
@@ -1027,6 +1035,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1248,6 +1268,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1525,6 +1557,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1836,6 +1873,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign trasnaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
@@ -2971,6 +3021,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    connection.
   </para>
 
+  <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact">
+   <title><structname>pg_stat_foreign_xact</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of a foreign transaction resolver process</entry>
+    </row>
+    <row>
+     <entry><structfield>dbid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which the foreign transaction resolver is connected</entry>
+    </row>
+    <row>
+     <entry><structfield>last_resolved_time</structfield></entry>
+     <entry><type>timestamp with time zone</type></entry>
+     <entry>Time at which the process last resolved a foreign transaction</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one
+   row per foreign transaction resolver process, showing state of resolution
+   of foreign transactions.
+  </para>
 
   <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver">
    <title><structname>pg_stat_archiver</structname> View</title>
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index c41ce9499b..5ef1f4a329 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -170,6 +170,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index ea08d0b614..58f1e4fd15 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v21-0005-Add-regression-tests-for-atomic-commit.patchapplication/octet-stream; name=v21-0005-Add-regression-tests-for-atomic-commit.patchDownload
From 42b1a1bbef2fd7eb928becc183cfac865d901276 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v21 5/5] Add regression tests for atomic commit.

---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 ++
 .../test_fdwxact/expected/test_fdwxact.out    | 233 +++++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 205 ++++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 137 +++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 471 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 +++++++
 src/test/regress/pg_regress.c                 |  13 +-
 13 files changed, 1319 insertions(+), 5 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 29de73c060..8a48e6ba19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -13,6 +13,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ab4ff2d89a
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,233 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- Test 'disabled' case.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' case.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Test 'prefer' case.
+-- The cases where failed in 'required' case shoul pass in 'prefer'.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+-- We modify at least one server that doesn't support two-phase commit.
+-- These servers are committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+-- Test cases for preparing the local transaction.
+SET foreign_twophase_commit TO 'prefer';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- PREPARE needs all involved foreign servers to support two-phsae
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..358eea8630
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,205 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- Test 'disabled' case.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+-- Test 'required' case.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+-- Test 'prefer' case.
+-- The cases where failed in 'required' case shoul pass in 'prefer'.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+
+-- We modify at least one server that doesn't support two-phase commit.
+-- These servers are committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+-- Test cases for preparing the local transaction.
+SET foreign_twophase_commit TO 'prefer';
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+SELECT count(*)  FROM pg_foreign_xacts();
+
+-- PREPARE needs all involved foreign servers to support two-phsae
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..8d48a74e86
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,137 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the case where transaction attempting prepare the local transaction fails after
+# preparing foreign transactions. The first attempt should be succeeded, but the second
+# attempt will fail after preparing foreign transaction, and should rollback the prepared
+# foreign transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
+
+# Inject an panic into prepare phase on srv_2pc_2. The server crashes after preparing both
+# foreign transaction. After the restart, those transactions are recovered as in-doubt
+# transactions. We check if the resolver process rollbacks those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'prepare', 'srv_2pc_2');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
+
+# Inject an panic into commit phase on srv_2pc_1. The server crashes due to the panic
+# error raised by resolver process during commit prepared foreign transaction on srv_2pc_1.
+# After the restart, those transactions are recovered as in-doubt transactions. We check if
+# the resolver process commits those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'commit', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..a75d3cde14
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,471 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 38b2b1e8e1..f30fe6b492 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2335,9 +2335,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2352,7 +2355,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v21-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v21-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From c7460c761ae23245cc500f8ada5964b7d13aeec7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:28:58 +0500
Subject: [PATCH v21 4/5] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 603 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 280 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 124 +++-
 doc/src/sgml/postgres-fdw.sgml                |  45 ++
 8 files changed, 831 insertions(+), 259 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 52d1fe3563..d55884b49b 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -92,23 +90,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +129,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +136,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +149,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +191,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +200,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +211,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +227,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -473,7 +521,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -700,193 +748,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -903,10 +764,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -917,6 +774,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1251,3 +1112,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recursively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..8c31e26406 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8923,10 +8944,10 @@ RESET ROLE;
 ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD password_required 'false');
 SET ROLE regress_nosuper;
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
+ c1 | c2 |        c3         |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------------------+------------------------------+--------------------------+----+------------+-----
+  1 |  2 | 00001_trig_update | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1          | foo
 (1 row)
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
@@ -8943,16 +8964,16 @@ HINT:  User mappings with the sslcert or sslkey options set may only be created
 DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapping.
 RESET ROLE;
 -- The user mapping for public is passwordless and lacks the password_required=false
 -- mapping option, but will work because the current user is a superuser.
 SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+ c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo
 (1 row)
 
 -- cleanup
@@ -8961,16 +8982,225 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" of relation "T 5" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..105451d199 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..43ffd4f73f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..1ef66123df 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2598,7 +2621,7 @@ ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD passwor
 SET ROLE regress_nosuper;
 
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
 -- first set password_required so we see the right error messages
@@ -2612,7 +2635,7 @@ DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 RESET ROLE;
 
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index eab2cc9378..3ea3ce9335 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -477,6 +477,43 @@ OPTIONS (ADD password_required 'false');
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -504,6 +541,14 @@ OPTIONS (ADD password_required 'false');
    managed by creating corresponding remote savepoints.
   </para>
 
+  <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
   <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
-- 
2.23.0

v21-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v21-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 111984cb33ebeec8a8f9a01d73c7d0b935429e42 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:16:02 +0900
Subject: [PATCH v21 2/5] Support atomic commit among multiple foreign servers.

---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  110 +
 src/backend/access/fdwxact/fdwxact.c          | 2771 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  566 ++++
 src/backend/access/fdwxact/resolver.c         |  436 +++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   56 +
 src/backend/access/transam/xact.c             |   29 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |    6 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   80 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  163 +
 src/include/access/fdwxact_launcher.h         |   28 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   63 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   22 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    6 +
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |    7 +
 55 files changed, 4834 insertions(+), 17 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..d9f08f4cfa
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,110 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant. The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+We record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE each foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for PREPARE, COMMIT or ROLLBACK
+the transaction on the foreign server, respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions will have an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before the foreign
+transaction is committed and aborted by FDW callback functions respectively.
+FdwXact entry is removed once the foreign transaction is resolved with WAL
+logging.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*1). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                     PREPARING                      |----+
+ +----------------------------------------------------+    |
+                          |                                |
+                          v                                |
+ +----------------------------------------------------+    |
+ |                    PREPARED(*1)                    |    | (*2)
+ +----------------------------------------------------+    |
+           |                               |               |
+           v                               v               |
+ +--------------------+          +--------------------+    |
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |<---+
+ +--------------------+          +--------------------+
+
+(*1) Recovered FdwXact entries starts with PREPARED
+(*2) Paths when an error occurrs during preparing
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..698bd9cff4
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2771 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * Two-phase commit protocol is used when the transaction modified two or
+ * more servers including the local node.  If two-phase commit protocol
+ * is not required all foreign transactions are committed at pre-commit
+ * phase.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit.  The foreign servers are
+ * registered if FDW has both CommitForeignTransaction API and
+ * RollbackForeignTransaction API.  Registered participant servers are
+ * identified by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * all foreign servers.  And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask to commit, we also tell to notify us when
+ * it's done, so that we can wait interruptibly to finish, and so that
+ * we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request.  We have one queue
+ * of which elements are ordered by the timestamp when they expect to be
+ * processed.  Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp when they expects to be processed.
+ * On failure, it enqueues again with new timestamp (last timestamp +
+ * foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction).  Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by
+ * pg_resovle_foreign_xact() function.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.  To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on four linked concepts:
+ *
+ * * All FdwXact fields except for indoubt, inprocessing and status are protected
+ *   by FdwXactLock.  These three fields are protected by its mutex.
+ * * Setting held_by of an FdwXact entry means to own the FdwXact entry, which
+ *   prevent it from updated and removed by concurrent processes.
+ * * The FdwXact whose inprocessing is true is also not processed or removed
+ *   by concurrent processes.
+ * * A process who is going to process foreign transaction needs to hold its
+ *   FdwXact entry in advance.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk.  FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon.  We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallack(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define SeverSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.  This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants may not support transaction callbacks: commit, rollback and
+ * prepare.  If a member of participants doesn't support any transaction
+ * callbacks, i.g. ServerSupportTransactionCallack() returns false,
+ * we don't end its transaction.
+ *
+ * FdwXactParticipants_tmp is used to update FdwXactParticipants atomically
+ * when executing COMMIT/ROLLBACK PREPARED command.  In COMMIT PREPARED case,
+ * we don't want to rollback foreign transactions even if an error occurs,
+ * because the local prepared transaction never turn over rollback in that
+ * case.  However, preparing FdwXactParticipants might be lead an error
+ * because of calling palloc() inside.  So we prepare FdwXactParticipants in
+ * two phase.  In the first phase, PrepareFdwXactParticipants(), we collect
+ * all foreign transactions associated with the local prepared transactions
+ * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
+ * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
+ * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
+ *
+ * FdwXactLocalXid is the local transaction id associated with FdwXactParticipants.
+ */
+static List *FdwXactParticipants = NIL;
+static List *FdwXactParticipants_tmp = NIL;
+static TransactionId FdwXactLocalXid = InvalidTransactionId;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase,
+											 bool for_commit);
+static bool checkForeignTwophaseCommitRequired(void);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static void FdwXactPrepareForeignTransactions(bool prepare_all);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(bool mark_indoubt);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static TransactionId FdwXactDetermineTransactionFate(TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						bool hold);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation	rel;
+	Oid			serverid;
+	Oid			userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction. The foreign transaction identified
+ * by given server id and user id.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	pfree(routine);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	return fdw_part;
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and when 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	ListCell   *lc;
+	bool		need_twophase_commit;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false if
+	 * foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = checkForeignTwophaseCommitRequired();
+
+	/*
+	 * Prepare foreign transactions on foreign servers that support two-phase
+	 * commit.
+	 */
+	if (need_twophase_commit)
+	{
+		FdwXactPrepareForeignTransactions(false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+
+	/*
+	 * Commit other foreign transactions and delete the participant entry from
+	 * the list.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		/*
+		 * Skip already prepared foreign transactions. Note that we keep those
+		 * FdwXactParticipants until the end of the transaction.
+		 */
+		if (fdw_part->fdwxact)
+			continue;
+
+		/* Delete non-transaction-support participants */
+		if (!ServerSupportTransactionCallack(fdw_part))
+		{
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+			continue;
+		}
+
+		/* Commit the foreign transaction in one-phase */
+		FdwXactParticipantEndTransaction(fdw_part, true, true);
+
+		/* Transaction successfully committed delete from the participant list */
+		FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(void)
+{
+	ListCell   *lc;
+	bool		need_twophase_commit;
+	bool		have_notwophase;
+	int			nserverswritten = 0;
+	int			nserverstwophase = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (SeverSupportTwophaseCommit(fdw_part))
+			nserverstwophase++;
+
+		nserverswritten++;
+	}
+	Assert(nserverswritten >= nserverstwophase);
+
+	/* check if there is any servers that don't support two-phase commit */
+	have_notwophase = (nserverswritten != nserverstwophase);
+
+	/* Did we modify the local non-temporary data? */
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+	{
+		nserverswritten++;
+
+		/*
+		 * We increment nserverstwophase as well for making code simple,
+		 * although we don't actually use two-phase commit for the local
+		 * transaction.
+		 */
+		nserverstwophase++;
+	}
+
+	if (nserverswritten <= 1)
+		return false;
+
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+	{
+		/*
+		 * In 'required' case, we require for all modified server to support
+		 * two-phase commit.
+		 */
+		need_twophase_commit = (nserverswritten >= 2);
+	}
+	else
+	{
+		Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
+
+		/*
+		 * In 'prefer' case, we use two-phase commit when this transaction modified
+		 * two or more servers including the local server or servers that support
+		 * two-phase commit.
+		 */
+		need_twophase_commit = (nserverstwophase >= 2);
+	}
+
+	/*
+	 * If foreign two phase commit is required then all foreign serves must be
+	 * capable of doing two-phase commit
+	 */
+	if (need_twophase_commit)
+	{
+		/* Parameter check */
+		if (max_prepared_foreign_xacts == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+		if (max_foreign_xact_resolvers == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+		if (have_notwophase &&
+			foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+					 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+	}
+
+	return need_twophase_commit;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase,
+								 bool for_commit)
+{
+	FdwXactRslvState state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state.xid = FdwXactLocalXid;
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = onephase ? NULL : fdw_part->fdwxact_id;
+	state.flags = onephase ? FDWXACT_FLAG_ONEPHASE : 0;
+
+	if (for_commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(bool prepare_all)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Save the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			continue;
+
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, FdwXactLocalXid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(FdwXactLocalXid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.xid = FdwXactLocalXid;
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->inprocessing = false;
+	fdwxact->indoubt = false;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All involved
+	 * servers need to support two-phase commit as we prepare on them regardless of
+	 * modified or not.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction that has operated on a foreign server")));
+	}
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions(true);
+
+	/*
+	 * We keep prepared foreign transaction participants to rollback them in case
+	 * of failure.
+	 */
+}
+
+void
+PostPrepare_FdwXact(void)
+{
+	/* After preparing the local transaction, we can forget all participants */
+	ForgetAllFdwXactParticipants(false);
+}
+
+/*
+ * Collect all foreign transactions associated with the given xid.  Return true
+ * if COMMIT PREPARED or ROLLBACK PREPARED needs to wait for all foreign transactions
+ * to be resolved.  The collected foreign transactions are kept in FdwXactParticipants_tmp,
+ * so the caller must call SetFdwXactParticipants() later if this function returns true.
+ */
+bool
+PrepareFdwXactParticipants(TransactionId xid)
+{
+	MemoryContext old_ctx;
+
+	Assert(FdwXactParticipants_tmp == NIL);
+
+	if (!TwoPhaseExists(xid))
+		return false;
+
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXactParticipant *fdw_part;
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwRoutine *routine;
+
+		if (!fdwxact->valid || fdwxact->local_xid != xid)
+			continue;
+
+		routine = GetFdwRoutineByServerId(fdwxact->serverid);
+		fdw_part = create_fdwxact_participant(fdwxact->serverid, fdwxact->userid,
+											  routine);
+		fdw_part->modified = true;
+		fdw_part->fdwxact = fdwxact;
+
+		/* Add to the participants list */
+		FdwXactParticipants_tmp = lappend(FdwXactParticipants_tmp, fdw_part);
+	}
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_ctx);
+
+	/*
+	 * We cannot proceed to commit this prepared transaction when
+	 * foreign_twophase_commit is disabled.
+	 */
+	if (FdwXactParticipants_tmp != NIL &&
+		!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	return (FdwXactParticipants_tmp != NIL);
+}
+
+/*
+ * Make the collected foreign transactions the participants of this transaction and
+ * hold all of them.  This function must be called after PrepareFdwXactParticipants().
+ */
+void
+SetFdwXactParticipants(TransactionId xid, bool commit)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants_tmp != NIL);
+	Assert(FdwXactParticipants == NIL);
+
+	FdwXactLocalXid = xid;
+	FdwXactParticipants = FdwXactParticipants_tmp;
+	FdwXactParticipants_tmp = NIL;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(SeverSupportTwophaseCommit(fdw_part));
+
+		/* Hold the fdwxact entry and set the status */
+		SpinLockAcquire(&fdw_part->fdwxact->mutex);
+		Assert(fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED);
+		fdw_part->fdwxact->held_by = MyBackendId;
+		fdw_part->fdwxact->status = commit
+			? FDWXACT_STATUS_COMMITTING
+			: FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&fdw_part->fdwxact->mutex);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants(true);
+}
+
+/*
+ * Wait for its all foreign transactions to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitForResolution(TransactionId wait_xid)
+{
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/*
+	 * Quick exit if either atomic commit is not requested or we don't have
+	 * any participants.
+	 */
+	if (!IsForeignTwophaseCommitRequested() || FdwXactParticipants == NIL)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+				 TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+	bool		found = false;
+
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	/* Initialize variables */
+	*nextResolutionTs_p = -1;
+	*waitXid_p = InvalidTransactionId;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+		{
+			if (proc->fdwXactNextResolutionTs <= now)
+			{
+				/* Found a waiting process */
+				found = true;
+				*waitXid_p = proc->fdwXactWaitXid;
+			}
+			else
+				/* Found a waiting process supposed to be processed later */
+				*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+
+			break;
+		}
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return found ? proc : NULL;
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * In abort case, this function ends foreign transaction participants and possibly
+ * rollback their prepared foreign trasnactions.
+ */
+extern void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+			FdwXact		fdwxact = fdw_part->fdwxact;
+			int			status;
+
+			if (!fdwxact)
+			{
+				/* Rollback foreign transaction in one-phase if supported */
+				if (ServerSupportTransactionCallack(fdw_part))
+					FdwXactParticipantEndTransaction(fdw_part, true, false);
+				continue;
+			}
+
+			/*
+			 * Abort the foreign transaction.  For participants whose status
+			 * is FDWXACT_STATUS_PREPARING, we close the transaction in
+			 * one-phase. In addition, since we are not sure that the
+			 * preparation has been completed on the foreign server, we also
+			 * attempts to rollback the prepared foreign transaction.  Note
+			 * that it's FDWs responsibility that they tolerate
+			 * OBJECT_NOT_FOUND error in abort case.
+			 */
+			SpinLockAcquire(&fdwxact->mutex);
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&fdwxact->mutex);
+
+			switch (status)
+			{
+				case FDWXACT_STATUS_PREPARING:
+					/* One-phase rollback foreign transaction */
+					FdwXactParticipantEndTransaction(fdw_part, true, false);
+					/* FALLTHROUGH */
+				case FDWXACT_STATUS_PREPARED:
+				case FDWXACT_STATUS_ABORTING:
+					/* One-phase rollback foreign transaction */
+					FdwXactParticipantEndTransaction(fdw_part, false, false);
+					break;
+				case FDWXACT_STATUS_COMMITTING:
+					Assert(false);
+					break;
+			}
+
+			/* Resolution was a success, remove the entry */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (fdwxact->ondisk)
+				RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								  fdwxact->serverid, fdwxact->userid,
+								  true);
+			remove_fdwxact(fdwxact);
+			LWLockRelease(FdwXactLock);
+		}
+
+		/* All foreign transaction should be aborted */
+		list_free(FdwXactParticipants);
+		FdwXactParticipants = NIL;
+	}
+
+	ForgetAllFdwXactParticipants(true);
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(bool mark_indoubt)
+{
+	ListCell   *cell;
+	int			nlefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		Assert(fdwxact);
+
+		/*
+		 * Unlock and mark a foreign transaction as in-doubt.  Note that there
+		 * is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.  Also we do these check by
+		 * transaction id because these foreign transaction may already be
+		 * held by the resolver.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->valid && fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+
+			if (mark_indoubt)
+			{
+				fdwxact->indoubt = true;	/* let resolver to process */
+				nlefts++;
+			}
+		}
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (nlefts > 0)
+	{
+		Assert(mark_indoubt);
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", nlefts);
+		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+	FdwXactParticipants_tmp = NIL;
+	FdwXactLocalXid = InvalidTransactionId;
+}
+
+/*
+ * Resolve foreign transactions at the give indexes. If 'waiter' is not NULL,
+ * we release the waiter after we resolved all of the given foreign transactions
+ * On failure we re-enqueue the waiting backend after incremented the next
+ * resolution time.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		PG_TRY();
+		{
+			FdwXactResolveOneFdwXact(fdwxact);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			if (waiter)
+			{
+				LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+				if (waiter->fdwXactState == FDWXACT_WAITING)
+				{
+					SHMQueueDelete(&(waiter->fdwXactLinks));
+					pg_write_barrier();
+					waiter->fdwXactNextResolutionTs =
+						TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+													foreign_xact_resolution_retry_interval);
+					FdwXactQueueInsert(waiter);
+				}
+				LWLockRelease(FdwXactResolutionLock);
+			}
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	if (!waiter)
+		return;
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter could
+	 * already be detached if user cancelled to wait before resolution.
+	 */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid,
+					  false);
+	LWLockRelease(FdwXactLock);
+
+	return (idx != -1);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Determine whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactDetermineTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted. This should not happen except for one case
+	 * where the local transaction is prepared and this foreign transaction is
+	 * being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/*
+ * Commit or rollback one prepared foreign transaction.  After resolved
+ * successfully, the FdwXact entry is removed from the shared memory and also
+ * remove the corresponding on-disk file.
+ */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->held_by != InvalidBackendId || fdwxact->inprocessing);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactDetermineTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare resolution state to pass to API */
+	state.xid = fdwxact->local_xid;
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respectively.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool hold)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		bool		inprocessing;
+
+		if (!fdwxact->valid)
+			continue;
+
+		SpinLockAcquire(&fdwxact->mutex);
+		inprocessing = fdwxact->inprocessing;
+		SpinLockRelease(&fdwxact->mutex);
+
+		/*
+		 * If we're attempting to hold this entry, skip if it is already held
+		 * or being processed.
+		 */
+		if (hold &&
+			(inprocessing || fdwxact->held_by != InvalidBackendId))
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+
+		if (hold)
+			fdwxact->held_by = MyBackendId;
+
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know the xact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		bool		indoubt;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		indoubt = fdwxact->indoubt;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = BoolGetDatum(indoubt);
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/* Find and hold the FdwXact entry */
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid, true);
+
+	LWLockRelease(FdwXactLock);
+
+	if (idx < 0)
+	{
+		/* No entry */
+		PG_RETURN_BOOL(false);
+	}
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1, NULL);
+	}
+	PG_CATCH();
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[idx];
+
+		SpinLockAcquire(&fdwxact->mutex);
+		FdwXactCtl->fdwxacts[idx]->held_by = InvalidBackendId;
+		SpinLockRelease(&fdwxact->mutex);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			i;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->local_xid == xid && fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+	{
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	if (fdwxact->inprocessing || fdwxact->held_by != InvalidBackendId)
+	{
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot remove foreign transaction entry which is being processed")));
+	}
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..deb35abfc2
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,566 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+	int			i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+	int			i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		bool		indoubt;
+		BackendId	held_by;
+
+		if (!fdwxact->valid)
+			continue;
+
+		SpinLockAcquire(&fdwxact->mutex);
+		indoubt = fdwxact->indoubt;
+		held_by = fdwxact->held_by;
+		SpinLockRelease(&fdwxact->mutex);
+
+		ereport(LOG, (errmsg("[%d] indoubt %s held_by %d id %s",
+							 i,
+							 indoubt ? "true" : "false",
+							 held_by,
+							 fdwxact->fdwxact_id)));
+
+		if ((indoubt && held_by == InvalidBackendId) ||
+			(!indoubt && held_by != InvalidBackendId))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..b91a2e1e88
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,436 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int			foreign_xact_resolution_retry_interval;
+int			foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_fdwxacts(PGPROC *waiter);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. We clear that flag from those entries on failure.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* clear inprocessing flags */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->inprocessing = false;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or the queue has
+		 * only waiters that have a future resolution timestamp.
+		 */
+		for (;;)
+		{
+			PGPROC	   *waiter;
+
+			CHECK_FOR_INTERRUPTS();
+
+			LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+
+			waiter = FdwXactGetWaiter(now, &resolutionTs, &waitXid);
+
+			if (!waiter)
+			{
+				/* Not found, break */
+				LWLockRelease(FdwXactResolutionLock);
+				break;
+			}
+
+			/* Hold the waiting foreign transactions */
+			hold_fdwxacts(waiter);
+			Assert(nheld > 0);
+			LWLockRelease(FdwXactResolutionLock);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, waiter);
+			CommitTransactionCommand();
+
+			last_resolution_time = now;
+		}
+
+		/* Hold in-doubt transactions */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, NULL);
+			CommitTransactionCommand();
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		/* There is no waiting backend */
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * FdwXactWaiterExists check.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Mark in-doubt transactions as in-processing.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->held_by == InvalidBackendId && fdwxact->indoubt)
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* hold lock */
+			SpinLockAcquire(&fdwxact->mutex);
+			fdwxact->inprocessing = true;
+			SpinLockRelease(&fdwxact->mutex);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Mark foreign transactions associated with the given waiter's transaction
+ * as in-processing.
+ */
+static void
+hold_fdwxacts(PGPROC *waiter)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == waiter->databaseId &&
+			fdwxact->local_xid == waiter->fdwXactWaitXid)
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* hold lock */
+			SpinLockAcquire(&fdwxact->mutex);
+			Assert(!fdwxact->indoubt);
+			Assert(fdwxact->held_by = waiter->backendId);
+			fdwxact->inprocessing = true;
+			SpinLockRelease(&fdwxact->mutex);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e1904877fa..2b9e039580 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2196,6 +2226,9 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	XLogRecPtr	recptr;
 	TimestampTz committs = GetCurrentTimestamp();
 	bool		replorigin;
+	bool		need_fdwxact_commit;
+
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Are we using the replication origins feature?  Or, in other words, are
@@ -2266,6 +2299,16 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid, true);
+		FdwXactWaitForResolution(xid);
+	}
 }
 
 /*
@@ -2285,6 +2328,9 @@ RecordTransactionAbortPrepared(TransactionId xid,
 							   const char *gid)
 {
 	XLogRecPtr	recptr;
+	bool		need_fdwxact_commit;
+
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Catch the scenario where we aborted partway through
@@ -2325,6 +2371,16 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be rolled back.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid, false);
+		FdwXactWaitForResolution(xid);
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cd30b62d36..c611fd8b45 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1219,6 +1220,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1227,6 +1229,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1265,12 +1268,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1428,6 +1432,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitForResolution(xid);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2087,6 +2099,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2254,6 +2269,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2341,6 +2357,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2532,6 +2550,9 @@ PrepareTransaction(void)
 	 */
 	PostPrepare_Twophase();
 
+	/* Release held FdwXact entries */
+	PostPrepare_FdwXact();
+
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
@@ -2542,6 +2563,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	//AtEOXact_FdwXact(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2751,6 +2773,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXact(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ca09d81b08..eae8c60db3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4599,6 +4600,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6286,6 +6288,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6836,14 +6841,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7045,7 +7051,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7558,6 +7567,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7888,6 +7898,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9183,6 +9196,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9712,8 +9726,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9731,6 +9747,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9747,6 +9764,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9952,6 +9970,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10151,6 +10170,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 56420bbc9d..56af9e6408 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..a1dea253c2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2807,8 +2807,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index f197869752..6206265424 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1419,6 +1433,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1572,6 +1595,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..3fa8bfe09f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -939,7 +940,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..29f376e48c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1ec07bad07..e5dee94764 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -2418,6 +2420,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2258424e81 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d7f99d9944..84bb1913f3 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3667,6 +3667,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3777,6 +3783,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATE:
 			event_name = "HashBatchAllocate";
 			break;
@@ -4103,6 +4112,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 160afe9f39..6a83f19e24 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -973,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..9d34817f39 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 3c2b369615..56c43cf741 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -94,6 +94,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -249,6 +251,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1311,6 +1314,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1376,6 +1380,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1425,6 +1430,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3125,6 +3139,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6985e8eed..241b099238 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 XactTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index f5eef6fa4e..9bd1e1791a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8958ec8103..5ed6c05b18 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2f3e0a70e0..1e46aff23f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -426,6 +427,25 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -763,6 +783,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2471,6 +2495,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4599,6 +4669,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 995b6ca155..d7ca008a9e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -344,6 +346,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled, prefer or required
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 67021a6dc1..78d882ddb2 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e73639df74..3041c39bc0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 233441837f..b040202043 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..32e041e246
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,163 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER, /* use twophase commit where available */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is being
+								 * committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is being
+								 * aborted */
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	FdwXactStatus status;
+	bool		indoubt;		/* Is an in-doubt transaction? */
+	bool		inprocessing;	/* resolver is processing? */
+	slock_t		mutex;			/* protect above three fields */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	TransactionId xid;
+
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void FdwXactReleaseWaiter(PGPROC *waiter);
+extern void FdwXactWaitForResolution(TransactionId wait_xid);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter);
+extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+								TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern bool PrepareFdwXactParticipants(TransactionId xid);
+extern void SetFdwXactParticipants(TransactionId xid, bool commit);
+extern void PreCommit_FdwXact(void);
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void AtPrepare_FdwXact(void);
+extern void PostPrepare_FdwXact(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a04fc70326..6f1f336e31 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c8869d5226..da0d442f1b 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..df5189dd2d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5981,6 +5981,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6099,6 +6117,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c55dc1481c..2186c1c5d0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -806,6 +806,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,6 +855,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_FDWXACT_RESOLUTION,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
 	WAIT_EVENT_HASH_BATCH_LOAD,
@@ -969,6 +972,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1ee9000b2b..4150d8a3e4 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -154,6 +155,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 454c2df487..f977ca43d4 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e32215..628eaf531e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1342,6 +1342,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.23.0

#47Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#46)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, May 19, 2020 at 12:33 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

I think that the current three values are useful for users. The
‘required’ mode is used when users want to ensure all writes involved
with the transaction are committed atomically. That being said, as
some FDW plugin might not support the prepare API we cannot force
users to use this mode all the time when using atomic commit.
Therefore ‘prefer’ mode would be useful for this case. Both modes use
2pc only when it's required for atomic commit.

So what do you think my idea that adding the behavior you proposed as
another new mode? As it’s better to keep the first version simple as
much as possible

If the intention is to keep the first version simple, then why do we
want to support any mode other than 'required'? I think it will limit
its usage for the cases where 2PC can be used only when all FDWs
involved support Prepare API but if that helps to keep the design and
patch simpler then why not just do that for the first version and then
extend it later. OTOH, if you think it will be really useful to keep
other modes, then also we could try to keep those in separate patches
to facilitate the review and discussion of the core feature.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#48Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#47)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 3 Jun 2020 at 14:50, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 19, 2020 at 12:33 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

I think that the current three values are useful for users. The
‘required’ mode is used when users want to ensure all writes involved
with the transaction are committed atomically. That being said, as
some FDW plugin might not support the prepare API we cannot force
users to use this mode all the time when using atomic commit.
Therefore ‘prefer’ mode would be useful for this case. Both modes use
2pc only when it's required for atomic commit.

So what do you think my idea that adding the behavior you proposed as
another new mode? As it’s better to keep the first version simple as
much as possible

If the intention is to keep the first version simple, then why do we
want to support any mode other than 'required'? I think it will limit
its usage for the cases where 2PC can be used only when all FDWs
involved support Prepare API but if that helps to keep the design and
patch simpler then why not just do that for the first version and then
extend it later. OTOH, if you think it will be really useful to keep
other modes, then also we could try to keep those in separate patches
to facilitate the review and discussion of the core feature.

‘disabled’ is the fundamental mode. We also need 'disabled' mode,
otherwise existing FDW won't work. I was concerned that many FDW
plugins don't implement FDW transaction APIs yet when users start
using this feature. But it seems to be a good idea to move 'prefer'
mode to a separate patch while leaving 'required'. I'll do that in the
next version patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#49Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#48)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Jun 3, 2020 at 12:02 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 3 Jun 2020 at 14:50, Amit Kapila <amit.kapila16@gmail.com> wrote:

If the intention is to keep the first version simple, then why do we
want to support any mode other than 'required'? I think it will limit
its usage for the cases where 2PC can be used only when all FDWs
involved support Prepare API but if that helps to keep the design and
patch simpler then why not just do that for the first version and then
extend it later. OTOH, if you think it will be really useful to keep
other modes, then also we could try to keep those in separate patches
to facilitate the review and discussion of the core feature.

‘disabled’ is the fundamental mode. We also need 'disabled' mode,
otherwise existing FDW won't work.

IIUC, if foreign_twophase_commit is 'disabled', we don't use a
two-phase protocol to commit distributed transactions, right? So, do
we check this at the time of Prepare or Commit whether we need to use
a two-phase protocol? I think this should be checked at prepare time.

+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>

This is written w.r.t foreign_twophase_commit. If one changes this
between prepare and commit, will it have any impact?

I was concerned that many FDW
plugins don't implement FDW transaction APIs yet when users start
using this feature. But it seems to be a good idea to move 'prefer'
mode to a separate patch while leaving 'required'. I'll do that in the
next version patch.

Okay, thanks. Please, see if you can separate out the documentation
for that as well.

Few other comments on v21-0003-Documentation-update:
----------------------------------------------------
1.
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>

/with that this/with which this

2.
+      <entry>
+       The OID of the foreign server on that the foreign transaction
is prepared
+      </entry>

/on that the/on which the

3.
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>

What exactly "Initial status" means?

4.
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is
in-doubt status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>

It would be better if you can add an additional sentence to say when
and or how can foreign transactions reach in-doubt state.

5.
If <literal>N</literal> local transactions each
+ across <literal>K</literal> foreign server this value need to be set

This part of the sentence can be improved by saying something like:
"If a user expects N local transactions and each of those involves K
foreign servers, this value..".

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#50Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#49)
6 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 4 Jun 2020 at 12:46, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 3, 2020 at 12:02 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 3 Jun 2020 at 14:50, Amit Kapila <amit.kapila16@gmail.com> wrote:

If the intention is to keep the first version simple, then why do we
want to support any mode other than 'required'? I think it will limit
its usage for the cases where 2PC can be used only when all FDWs
involved support Prepare API but if that helps to keep the design and
patch simpler then why not just do that for the first version and then
extend it later. OTOH, if you think it will be really useful to keep
other modes, then also we could try to keep those in separate patches
to facilitate the review and discussion of the core feature.

‘disabled’ is the fundamental mode.

Oops, I wanted to say 'required' is the fundamental mode.

We also need 'disabled' mode,
otherwise existing FDW won't work.

IIUC, if foreign_twophase_commit is 'disabled', we don't use a
two-phase protocol to commit distributed transactions, right? So, do
we check this at the time of Prepare or Commit whether we need to use
a two-phase protocol? I think this should be checked at prepare time.

When a client executes COMMIT to a distributed transaction, 2pc is
automatically, transparently used. In ‘required’ case, all involved
(and modified) foreign server needs to support 2pc. So if a
distributed transaction modifies data on a foreign server connected
via an existing FDW which doesn’t support 2pc, the transaction cannot
proceed commit, fails at pre-commit phase. So there should be two
modes: ‘disabled’ and ‘required’, and should be ‘disabled’ by default.

+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>

This is written w.r.t foreign_twophase_commit. If one changes this
between prepare and commit, will it have any impact?

Since the distributed transaction commit automatically uses 2pc when
executing COMMIT, it's not possible to change foreign_twophase_commit
between prepare and commit. So I'd like to explain the case where a
user executes PREPARE and then COMMIT PREPARED while changing
foreign_twophase_commit.

PREPARE can run only when foreign_twophase_commit is 'required' (or
'prefer') and all foreign servers involved with the transaction
support 2pc. We prepare all foreign transactions no matter what the
number of servers and modified or not. If either
foreign_twophase_commit is 'disabled' or the transaction modifies data
on a foreign server that doesn't support 2pc, it raises an error. At
COMMIT (or ROLLBACK) PREPARED, similarly foreign_twophase_commit needs
to be set to 'required'. It raises an error if the distributed
transaction has a foreign transaction and foreign_twophase_commit is
'disabled'.

I was concerned that many FDW
plugins don't implement FDW transaction APIs yet when users start
using this feature. But it seems to be a good idea to move 'prefer'
mode to a separate patch while leaving 'required'. I'll do that in the
next version patch.

Okay, thanks. Please, see if you can separate out the documentation
for that as well.

Few other comments on v21-0003-Documentation-update:
----------------------------------------------------
1.
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with that this foreign transaction
+       associates
+      </entry>

/with that this/with which this

2.
+      <entry>
+       The OID of the foreign server on that the foreign transaction
is prepared
+      </entry>

/on that the/on which the

3.
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>initial</literal> : Initial status.
+         </para>

What exactly "Initial status" means?

This part is out-of-date. Fixed.

4.
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is
in-doubt status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>

It would be better if you can add an additional sentence to say when
and or how can foreign transactions reach in-doubt state.

5.
If <literal>N</literal> local transactions each
+ across <literal>K</literal> foreign server this value need to be set

This part of the sentence can be improved by saying something like:
"If a user expects N local transactions and each of those involves K
foreign servers, this value..".

Thanks. I've incorporated all your comments.

I've attached the new version patch set. 0006 is a separate patch
which introduces 'prefer' mode to foreign_twophase_commit.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v22-0006-Add-prefer-mode-to-foreign_twophase_commit.patchapplication/octet-stream; name=v22-0006-Add-prefer-mode-to-foreign_twophase_commit.patchDownload
From bdadc8426a40ab0c1abb427b27c48e1dff316d18 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 3 Jun 2020 16:38:13 +0900
Subject: [PATCH v22 6/6] Add prefer mode to foreign_twophase_commit.

---
 doc/src/sgml/config.sgml                      | 33 ++++++++--------
 doc/src/sgml/distributed-transaction.sgml     | 10 ++---
 src/backend/access/fdwxact/fdwxact.c          | 38 ++++++++++++++++---
 src/backend/utils/misc/guc.c                  |  5 ++-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/include/access/fdwxact.h                  |  1 +
 .../test_fdwxact/expected/test_fdwxact.out    | 25 ++++++++++++
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 28 ++++++++++++++
 8 files changed, 114 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 918fac967c..18f3e967fe 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9102,18 +9102,21 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
         <para>
          Specifies whether distributed transaction commits ensures that all
          involved changes on foreign servers are committed or not. Valid
-         values are <literal>required</literal> and <literal>disabled</literal>.
-         The default setting is <literal>disabled</literal>. Setting to
-         <literal>disabled</literal> don't use two-phase commit protocol to
-         commit or rollback distributed transactions. When set to
-         <literal>required</literal> distributed transactions strictly requires
-         that all written servers can use two-phase commit protocol.  That is,
-         the distributed transaction cannot commit if even one server does not
-         support the prepare callback routine
+         values are <literal>required</literal>, <literal>prefer</literal> and
+         <literal>disabled</literal>. The default setting is
+         <literal>disabled</literal>. Setting to <literal>disabled</literal>
+         don't use two-phase commit protocol to commit or rollback distributed
+         transactions. When set to <literal>required</literal> distributed
+         transactions strictly requires that all written servers can use
+         two-phase commit protocol.  That is, the distributed transaction cannot
+         commit if even one server does not support the prepare callback routine
          (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
-         In <literal>required</literal> case, distributed transaction commit will
-         wait for all involving foreign transaction to be committed before the
-         command return a "success" indication to the client.
+         When set to <literal>prefer</literal> the distributed transaction use
+         two-phase commit protocol on only servers where available and commit on
+         others. In <literal>prefer</literal> and <literal>required</literal> case,
+         distributed transaction commit will wait for all involving foreign
+         transaction to be committed before the command return a "success"
+         indication to the client.
         </para>
 
         <para>
@@ -9123,10 +9126,10 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 
         <note>
          <para>
-          When <literal>disabled</literal> there can be risk of database
-          consistency among all servers that involved in the distributed
-          transaction when some foreign server crashes during committing the
-          distributed transaction.
+          When <literal>disabled</literal> or <literal>prefer</literal> there
+          can be risk of database consistency among all servers that involved in
+          the distributed transaction when some foreign server crashes during
+          committing the distributed transaction.
          </para>
         </note>
        </listitem>
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
index 139dd7f918..d992a807c3 100644
--- a/doc/src/sgml/distributed-transaction.sgml
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -48,11 +48,11 @@
        prepares all transaction on the foreign servers if two-phase commit is
        required. Two-phase commit is required when the transaction modifies
        data on two or more servers including the local server itself and
-       <xref linkend="guc-foreign-twophase-commit"/> is
-       <literal>required</literal>. If all preparations on foreign servers got
-       successful go to the next step. Any failure happens in this step,
-       the server changes to rollback, then rollback all transactions on both
-       local and foreign servers.
+       <xref linkend="guc-foreign-twophase-commit"/> is either
+       <literal>required</literal> or <literal>prefer</literal>.  If all
+       preparations on foreign servers got successful go to the next step.
+       Any failure happens in this step, the server changes to rollback,
+       then rollback all transactions on both local and foreign servers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 86630972c0..b29b463344 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -441,7 +441,9 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
  * When foreign twophase commit is enabled, the behavior depends on the value
  * of foreign_twophase_commit; when 'required' we strictly require for all
  * foreign servers' FDW to support two-phase commit protocol and ask them to
- * prepare foreign transactions, and when 'disabled' we ask all foreign servers
+ * prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and when 'disabled' we ask all foreign servers
  * to commit foreign transaction in one-phase. If we failed to commit any of
  * them we change to aborting.
  *
@@ -536,8 +538,9 @@ checkForeignTwophaseCommitRequired(void)
 {
 	ListCell   *lc;
 	bool		need_twophase_commit;
-	bool		have_notwophase = false;
+	bool		have_notwophase;
 	int			nserverswritten = 0;
+	int			nserverstwophase = 0;
 
 	if (!IsForeignTwophaseCommitRequested())
 		return false;
@@ -549,16 +552,29 @@ checkForeignTwophaseCommitRequired(void)
 		if (!fdw_part->modified)
 			continue;
 
-		if (!SeverSupportTwophaseCommit(fdw_part))
-			have_notwophase = true;
+		if (SeverSupportTwophaseCommit(fdw_part))
+			nserverstwophase++;
 
 		nserverswritten++;
 	}
+	Assert(nserverswritten >= nserverstwophase);
+
+	/* check if there is any servers that don't support two-phase commit */
+	have_notwophase = (nserverswritten != nserverstwophase);
 
 	/* Did we modify the local non-temporary data? */
 	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+	{
 		nserverswritten++;
 
+		/*
+		 * We increment nserverstwophase as well for making code simple,
+		 * although we don't actually use two-phase commit for the local
+		 * transaction.
+		 */
+		nserverstwophase++;
+	}
+
 	if (nserverswritten <= 1)
 		return false;
 
@@ -570,6 +586,17 @@ checkForeignTwophaseCommitRequired(void)
 		 */
 		need_twophase_commit = (nserverswritten >= 2);
 	}
+	else
+	{
+		Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
+
+		/*
+		 * In 'prefer' case, we use two-phase commit when this transaction modified
+		 * two or more servers including the local server or servers that support
+		 * two-phase commit.
+		 */
+		need_twophase_commit = (nserverstwophase >= 2);
+	}
 
 	/*
 	 * If foreign two phase commit is required then all foreign serves must be
@@ -590,7 +617,8 @@ checkForeignTwophaseCommitRequired(void)
 					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
 					 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
 
-		if (have_notwophase)
+		if (have_notwophase &&
+			foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 313bf33324..1e46aff23f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -428,11 +428,12 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
- * Although only "required" and "disabled" are documented, we accept all
- * the likely variants of "on" and "off".
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
  */
 static const struct config_enum_entry foreign_twophase_commit_options[] = {
 	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
 	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
 	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
 	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 03de9fbc81..47bb402680 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -358,7 +358,7 @@
 							# foreign transactions
 							# after a failed attempt
 #foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
-					# disabled or required
+					# disabled, prefer or required
 
 #------------------------------------------------------------------------------
 # QUERY TUNING
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 5df32b2703..32e041e246 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -37,6 +37,7 @@
 typedef enum
 {
 	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER, /* use twophase commit where available */
 	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
 										 * twophase commit */
 }			ForeignTwophaseCommitLevel;
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
index c6a91ac9f1..ce8465b52c 100644
--- a/src/test/modules/test_fdwxact/expected/test_fdwxact.out
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -221,3 +221,28 @@ BEGIN;
 INSERT INTO ft_1 VALUES (1);
 PREPARE TRANSACTION 'global_x1';
 ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+-- Test 'prefer' mode.
+-- The cases where failed in 'required' mode should pass in 'prefer' mode.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
index 8cf860e295..72a9ee6be4 100644
--- a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -191,3 +191,31 @@ PREPARE TRANSACTION 'global_x1';
 BEGIN;
 INSERT INTO ft_1 VALUES (1);
 PREPARE TRANSACTION 'global_x1';
+
+
+-- Test 'prefer' mode.
+-- The cases where failed in 'required' mode should pass in 'prefer' mode.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
-- 
2.23.0

v22-0005-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v22-0005-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 6c2e3227c020de105dfa071f478e9534f6cb24eb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v22 5/6] Add regression tests for foreign twophase commit.

---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 ++
 .../test_fdwxact/expected/test_fdwxact.out    | 223 +++++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 193 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 137 +++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 471 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 +++++++
 src/test/regress/pg_regress.c                 |  13 +-
 13 files changed, 1297 insertions(+), 5 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 29de73c060..8a48e6ba19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -13,6 +13,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..c6a91ac9f1
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,223 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..8cf860e295
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,193 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..8d48a74e86
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,137 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the case where transaction attempting prepare the local transaction fails after
+# preparing foreign transactions. The first attempt should be succeeded, but the second
+# attempt will fail after preparing foreign transaction, and should rollback the prepared
+# foreign transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
+
+# Inject an panic into prepare phase on srv_2pc_2. The server crashes after preparing both
+# foreign transaction. After the restart, those transactions are recovered as in-doubt
+# transactions. We check if the resolver process rollbacks those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'prepare', 'srv_2pc_2');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
+
+# Inject an panic into commit phase on srv_2pc_1. The server crashes due to the panic
+# error raised by resolver process during commit prepared foreign transaction on srv_2pc_1.
+# After the restart, those transactions are recovered as in-doubt transactions. We check if
+# the resolver process commits those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'commit', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..738690c978
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,471 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 38b2b1e8e1..f30fe6b492 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2335,9 +2335,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2352,7 +2355,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v22-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v22-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From b4d8e51c4731d388bdb63ebb3efbc6c275c696ca Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:28:58 +0500
Subject: [PATCH v22 4/6] postgres_fdw supports atomic commit APIs.

---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 603 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 280 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   7 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 124 +++-
 doc/src/sgml/postgres-fdw.sgml                |  45 ++
 8 files changed, 831 insertions(+), 259 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 52d1fe3563..d55884b49b 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -92,23 +90,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt,
+										  bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
- * Get a PGconn which can be used to execute queries on the remote PostgreSQL
- * server with the user's authorization.  A new connection is established
- * if we don't already have a suitable one, and a transaction is opened at
- * the right subtransaction nesting depth if we didn't do that already.
- *
- * will_prep_stmt must be true if caller intends to create any prepared
- * statements.  Since those don't go away automatically at transaction end
- * (not even on error), we need this flag to cue manual cleanup.
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
  */
-PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +129,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +136,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +149,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,6 +191,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
+		UserMapping	*user = GetUserMappingByOid(umid);
 		ForeignServer *server = GetForeignServer(user->serverid);
 
 		/* Reset all transient state fields, to be sure all are clean */
@@ -190,6 +200,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +211,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,11 +227,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		UserMapping		*user = GetUserMappingByOid(umid);
+
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
+	return entry;
+}
+
+/*
+ * Get a PGconn which can be used to execute queries on the remote PostgreSQL
+ * server with the user's authorization.  A new connection is established
+ * if we don't already have a suitable one, and a transaction is opened at
+ * the right subtransaction nesting depth if we didn't do that already.
+ *
+ * will_prep_stmt must be true if caller intends to create any prepared
+ * statements.  Since those don't go away automatically at transaction end
+ * (not even on error), we need this flag to cue manual cleanup.
+ */
+PGconn *
+GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(umid, will_prep_stmt, start_transaction);
+
 	return entry->conn;
 }
 
@@ -473,7 +521,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -700,193 +748,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -903,10 +764,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -917,6 +774,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1251,3 +1112,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id);
+
+	/* Do commit foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   state->server->servername, state->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 state->server->servername, state->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *state)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   state->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recursively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	bool			is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(state->usermapping->umid, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, state->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(state->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..dbdd4cc32c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8923,10 +8944,10 @@ RESET ROLE;
 ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD password_required 'false');
 SET ROLE regress_nosuper;
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
+ c1 | c2 |        c3         |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------------------+------------------------------+--------------------------+----+------------+-----
+  1 |  2 | 00001_trig_update | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1          | foo
 (1 row)
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
@@ -8943,16 +8964,16 @@ HINT:  User mappings with the sslcert or sslkey options set may only be created
 DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapping.
 RESET ROLE;
 -- The user mapping for public is passwordless and lacks the password_required=false
 -- mapping option, but will work because the current user is a superuser.
 SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+ c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo
 (1 row)
 
 -- cleanup
@@ -8961,16 +8982,225 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" of relation "T 5" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..105451d199 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user->umid, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user->umid, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user->umid, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user->umid, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user->umid, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping->umid, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..43ffd4f73f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,7 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *state);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *state);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *state);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..1ef66123df 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2598,7 +2621,7 @@ ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD passwor
 SET ROLE regress_nosuper;
 
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
 -- first set password_required so we see the right error messages
@@ -2612,7 +2635,7 @@ DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 RESET ROLE;
 
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index eab2cc9378..3ea3ce9335 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -477,6 +477,43 @@ OPTIONS (ADD password_required 'false');
    </para>
 
   </sect3>
+
+  <sect3>
+   <title>Transaction Management Options</title>
+
+   <para>
+    By default, if the transaction involves with multiple remote server,
+    each transaction on remote server is committed or aborted independently.
+    Some of transactions may fail to commit on remote server while other
+    transactions commit successfully. This may be overridden using
+    following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>two_phase_commit</literal></term>
+     <listitem>
+      <para>
+       This option controls whether <filename>postgres_fdw</filename> allows
+       to use two-phase-commit when transaction commits. This option can
+       only be specified for foreign servers, not per-table.
+       The default is <literal>false</literal>.
+      </para>
+
+      <para>
+       If this option is enabled, <filename>postgres_fdw</filename> prepares
+       transaction on remote server and <productname>PostgreSQL</productname>
+       keeps track of the distributed transaction.
+       <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more
+       than 1 on local server and <xref linkend="guc-max-prepared-transactions"/>
+       must set to more than 1 on remote server.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
  </sect2>
 
  <sect2>
@@ -504,6 +541,14 @@ OPTIONS (ADD password_required 'false');
    managed by creating corresponding remote savepoints.
   </para>
 
+  <para>
+   <filename>postgrs_fdw</filename> uses two-phase commit protocol during
+   transaction commits or aborts when the atomic commit of distributed
+   transaction (see <xref linkend="atomic-commit"/>) is required. So the remote
+   server should set <xref linkend="guc-max-prepared-transactions"/> more
+   than one so that it can prepare the remote transaction.
+  </para>
+
   <para>
    The remote transaction uses <literal>SERIALIZABLE</literal>
    isolation level when the local transaction has <literal>SERIALIZABLE</literal>
-- 
2.23.0

v22-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v22-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 2dca99e2d5fffb629b8b42ffbd4f31c2f63a43da Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:16:02 +0900
Subject: [PATCH v22 2/6] Support atomic commit among multiple foreign servers.

---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  110 +
 src/backend/access/fdwxact/fdwxact.c          | 2767 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  560 ++++
 src/backend/access/fdwxact/resolver.c         |  436 +++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   56 +
 src/backend/access/transam/xact.c             |   29 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |    6 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   79 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  162 +
 src/include/access/fdwxact_launcher.h         |   28 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   63 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   22 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    6 +
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |    7 +
 55 files changed, 4822 insertions(+), 17 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..d9f08f4cfa
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,110 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactAtomicCommitParticipants, which is maintained by
+PostgreSQL's the global transaction manager (GTM), as a distributed transaction
+participant. The registered foreign transactions are tracked until the end of
+transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+We record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE each foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for PREPARE, COMMIT or ROLLBACK
+the transaction on the foreign server, respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions will have an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before the foreign
+transaction is committed and aborted by FDW callback functions respectively.
+FdwXact entry is removed once the foreign transaction is resolved with WAL
+logging.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*1). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                     PREPARING                      |----+
+ +----------------------------------------------------+    |
+                          |                                |
+                          v                                |
+ +----------------------------------------------------+    |
+ |                    PREPARED(*1)                    |    | (*2)
+ +----------------------------------------------------+    |
+           |                               |               |
+           v                               v               |
+ +--------------------+          +--------------------+    |
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |<---+
+ +--------------------+          +--------------------+
+
+(*1) Recovered FdwXact entries starts with PREPARED
+(*2) Paths when an error occurrs during preparing
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..86630972c0
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2767 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * Two-phase commit protocol is used when the transaction modified two or
+ * more servers including the local node.  If two-phase commit protocol
+ * is not required all foreign transactions are committed at pre-commit
+ * phase.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit.  The foreign servers are
+ * registered if FDW has both CommitForeignTransaction API and
+ * RollbackForeignTransaction API.  Registered participant servers are
+ * identified by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * all foreign servers.  And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask to commit, we also tell to notify us when
+ * it's done, so that we can wait interruptibly to finish, and so that
+ * we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request.  We have one queue
+ * of which elements are ordered by the timestamp when they expect to be
+ * processed.  Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp when they expects to be processed.
+ * On failure, it enqueues again with new timestamp (last timestamp +
+ * foreign_xact_resolution_interval).
+ *
+ * If any network failure, server crash occurs or user stopped waiting
+ * prepared foreign transactions are left in in-doubt state (aka. in-doubt
+ * transaction).  Foreign transactions in in-doubt state are not resolved
+ * automatically so must be processed manually using by
+ * pg_resovle_foreign_xact() function.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.  To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on four linked concepts:
+ *
+ * * All FdwXact fields except for indoubt, inprocessing and status are protected
+ *   by FdwXactLock.  These three fields are protected by its mutex.
+ * * Setting held_by of an FdwXact entry means to own the FdwXact entry, which
+ *   prevent it from updated and removed by concurrent processes.
+ * * The FdwXact whose inprocessing is true is also not processed or removed
+ *   by concurrent processes.
+ * * A process who is going to process foreign transaction needs to hold its
+ *   FdwXact entry in advance.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk.  FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon.  We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallack(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define SeverSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.  This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants may not support transaction callbacks: commit, rollback and
+ * prepare.  If a member of participants doesn't support any transaction
+ * callbacks, i.g. ServerSupportTransactionCallack() returns false,
+ * we don't end its transaction.
+ *
+ * FdwXactParticipants_tmp is used to update FdwXactParticipants atomically
+ * when executing COMMIT/ROLLBACK PREPARED command.  In COMMIT PREPARED case,
+ * we don't want to rollback foreign transactions even if an error occurs,
+ * because the local prepared transaction never turn over rollback in that
+ * case.  However, preparing FdwXactParticipants might be lead an error
+ * because of calling palloc() inside.  So we prepare FdwXactParticipants in
+ * two phase.  In the first phase, PrepareFdwXactParticipants(), we collect
+ * all foreign transactions associated with the local prepared transactions
+ * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
+ * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
+ * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
+ *
+ * FdwXactLocalXid is the local transaction id associated with FdwXactParticipants.
+ */
+static List *FdwXactParticipants = NIL;
+static List *FdwXactParticipants_tmp = NIL;
+static TransactionId FdwXactLocalXid = InvalidTransactionId;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase,
+											 bool for_commit);
+static bool checkForeignTwophaseCommitRequired(void);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static void FdwXactPrepareForeignTransactions(bool prepare_all);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(bool mark_indoubt);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static TransactionId FdwXactDetermineTransactionFate(TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+						bool hold);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation	rel;
+	Oid			serverid;
+	Oid			userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction. The foreign transaction identified
+ * by given server id and user id.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	pfree(routine);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	return fdw_part;
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	ListCell   *lc;
+	bool		need_twophase_commit;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Set the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/*
+	 * We don't support foreign twophase commit in single user mode. Commit
+	 * each transaction in one-phase and reset the participant list.
+	 */
+	if (!IsUnderPostmaster)
+	{
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			/* Commit the foreign transaction in one-phase */
+			if (ServerSupportTransactionCallack(fdw_part))
+				FdwXactParticipantEndTransaction(fdw_part, true, true);
+		}
+
+		list_free(FdwXactParticipants);
+		FdwXactParticipants = NIL;
+		return;
+	}
+
+	/*
+	 * Check if we need to use foreign twophase commit. It's always false if
+	 * foreign twophase commit is disabled.
+	 */
+	need_twophase_commit = checkForeignTwophaseCommitRequired();
+
+	/*
+	 * Prepare foreign transactions on foreign servers that support two-phase
+	 * commit.
+	 */
+	if (need_twophase_commit)
+	{
+		FdwXactPrepareForeignTransactions(false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+
+	/*
+	 * Commit other foreign transactions and delete the participant entry from
+	 * the list.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		/*
+		 * Skip already prepared foreign transactions. Note that we keep those
+		 * FdwXactParticipants until the end of the transaction.
+		 */
+		if (fdw_part->fdwxact)
+			continue;
+
+		/* Delete non-transaction-support participants */
+		if (!ServerSupportTransactionCallack(fdw_part))
+		{
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+			continue;
+		}
+
+		/* Commit the foreign transaction in one-phase */
+		FdwXactParticipantEndTransaction(fdw_part, true, true);
+
+		/* Transaction successfully committed delete from the participant list */
+		FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(void)
+{
+	ListCell   *lc;
+	bool		need_twophase_commit;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		nserverswritten++;
+
+	if (nserverswritten <= 1)
+		return false;
+
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+	{
+		/*
+		 * In 'required' case, we require for all modified server to support
+		 * two-phase commit.
+		 */
+		need_twophase_commit = (nserverswritten >= 2);
+	}
+
+	/*
+	 * If foreign two phase commit is required then all foreign serves must be
+	 * capable of doing two-phase commit
+	 */
+	if (need_twophase_commit)
+	{
+		/* Parameter check */
+		if (max_prepared_foreign_xacts == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+		if (max_foreign_xact_resolvers == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+		if (have_notwophase)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+					 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+	}
+
+	return need_twophase_commit;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase,
+								 bool for_commit)
+{
+	FdwXactRslvState state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state.xid = FdwXactLocalXid;
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = onephase ? NULL : fdw_part->fdwxact_id;
+	state.flags = onephase ? FDWXACT_FLAG_ONEPHASE : 0;
+
+	if (for_commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(bool prepare_all)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			continue;
+
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, FdwXactLocalXid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(FdwXactLocalXid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.xid = FdwXactLocalXid;
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->held_by = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->inprocessing = false;
+	fdwxact->indoubt = false;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->held_by = InvalidBackendId;
+	fdwxact->indoubt = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Set the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All involved
+	 * servers need to support two-phase commit as we prepare on them regardless of
+	 * modified or not.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions(true);
+
+	/*
+	 * We keep prepared foreign transaction participants to rollback them in case
+	 * of failure.
+	 */
+}
+
+void
+PostPrepare_FdwXact(void)
+{
+	/* After preparing the local transaction, we can forget all participants */
+	ForgetAllFdwXactParticipants(false);
+}
+
+/*
+ * Collect all foreign transactions associated with the given xid if it's a prepared
+ * transaction.  Return true if COMMIT PREPARED or ROLLBACK PREPARED needs to wait for
+ * all foreign transactions to be resolved.  The collected foreign transactions are kept
+ * in FdwXactParticipants_tmp. The caller must call SetFdwXactParticipants() later
+ * if this function returns true.
+ */
+bool
+PrepareFdwXactParticipants(TransactionId xid)
+{
+	MemoryContext old_ctx;
+
+	Assert(FdwXactParticipants_tmp == NIL);
+
+	if (!TwoPhaseExists(xid))
+		return false;
+
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXactParticipant *fdw_part;
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwRoutine *routine;
+
+		if (!fdwxact->valid || fdwxact->local_xid != xid)
+			continue;
+
+		routine = GetFdwRoutineByServerId(fdwxact->serverid);
+		fdw_part = create_fdwxact_participant(fdwxact->serverid, fdwxact->userid,
+											  routine);
+		fdw_part->modified = true;
+		fdw_part->fdwxact = fdwxact;
+
+		/* Add to the participants list */
+		FdwXactParticipants_tmp = lappend(FdwXactParticipants_tmp, fdw_part);
+	}
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_ctx);
+
+	/*
+	 * We cannot proceed to commit this prepared transaction when
+	 * foreign_twophase_commit is disabled.
+	 */
+	if (FdwXactParticipants_tmp != NIL &&
+		!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a prepared foreign transaction commit when foreign_twophase_commit is \'disabled\'")));
+
+	return (FdwXactParticipants_tmp != NIL);
+}
+
+/*
+ * Set the collected foreign transactions to the participants of this transaction,
+ * and hold them.  This function must be called after CollectFdwXactParticipants().
+ */
+void
+SetFdwXactParticipants(TransactionId xid, bool commit)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants_tmp != NIL);
+	Assert(FdwXactParticipants == NIL);
+
+	FdwXactLocalXid = xid;
+	FdwXactParticipants = FdwXactParticipants_tmp;
+	FdwXactParticipants_tmp = NIL;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(SeverSupportTwophaseCommit(fdw_part));
+
+		/* Hold the fdwxact entry and set the status */
+		SpinLockAcquire(&fdw_part->fdwxact->mutex);
+		Assert(fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED);
+		fdw_part->fdwxact->held_by = MyBackendId;
+		fdw_part->fdwxact->status = commit
+			? FDWXACT_STATUS_COMMITTING
+			: FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&fdw_part->fdwxact->mutex);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants(true);
+}
+
+/*
+ * Wait for its all foreign transactions to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitForResolution(TransactionId wait_xid)
+{
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/*
+	 * Quick exit if either atomic commit is not requested or we don't have
+	 * any participants.
+	 */
+	if (!IsForeignTwophaseCommitRequested() || FdwXactParticipants == NIL)
+		return;
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+			break;
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+				 TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+	bool		found = false;
+
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	/* Initialize variables */
+	*nextResolutionTs_p = -1;
+	*waitXid_p = InvalidTransactionId;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+		{
+			if (proc->fdwXactNextResolutionTs <= now)
+			{
+				/* Found a waiting process */
+				found = true;
+				*waitXid_p = proc->fdwXactWaitXid;
+			}
+			else
+				/* Found a waiting process supposed to be processed later */
+				*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+
+			break;
+		}
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return found ? proc : NULL;
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * In abort case, this function ends foreign transaction participants and possibly
+ * rollback their prepared foreign trasnactions.
+ */
+extern void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+			FdwXact		fdwxact = fdw_part->fdwxact;
+			int			status;
+
+			if (!fdwxact)
+			{
+				/* Rollback foreign transaction in one-phase if supported */
+				if (ServerSupportTransactionCallack(fdw_part))
+					FdwXactParticipantEndTransaction(fdw_part, true, false);
+				continue;
+			}
+
+			/*
+			 * Abort the foreign transaction.  For participants whose status
+			 * is FDWXACT_STATUS_PREPARING, we close the transaction in
+			 * one-phase. In addition, since we are not sure that the
+			 * preparation has been completed on the foreign server, we also
+			 * attempts to rollback the prepared foreign transaction.  Note
+			 * that it's FDWs responsibility that they tolerate
+			 * OBJECT_NOT_FOUND error in abort case.
+			 */
+			SpinLockAcquire(&fdwxact->mutex);
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&fdwxact->mutex);
+
+			switch (status)
+			{
+				case FDWXACT_STATUS_PREPARING:
+					/* One-phase rollback foreign transaction */
+					FdwXactParticipantEndTransaction(fdw_part, true, false);
+					/* FALLTHROUGH */
+				case FDWXACT_STATUS_PREPARED:
+				case FDWXACT_STATUS_ABORTING:
+					/* One-phase rollback foreign transaction */
+					FdwXactParticipantEndTransaction(fdw_part, false, false);
+					break;
+				case FDWXACT_STATUS_COMMITTING:
+					Assert(false);
+					break;
+			}
+
+			/* Resolution was a success, remove the entry */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			if (fdwxact->ondisk)
+				RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								  fdwxact->serverid, fdwxact->userid,
+								  true);
+			remove_fdwxact(fdwxact);
+			LWLockRelease(FdwXactLock);
+		}
+
+		/* All foreign transaction should be aborted */
+		list_free(FdwXactParticipants);
+		FdwXactParticipants = NIL;
+	}
+
+	ForgetAllFdwXactParticipants(true);
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Mark my foreign transaction participants as in-doubt and clear
+ * the FdwXactParticipants list.
+ *
+ * If we leave any foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of in-doubt transaction is not
+ * truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(bool mark_indoubt)
+{
+	ListCell   *cell;
+	int			nlefts = 0;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		Assert(fdwxact);
+
+		/*
+		 * Unlock and mark a foreign transaction as in-doubt.  Note that there
+		 * is a race condition; the FdwXact entries in FdwXactParticipants
+		 * could be used by other backend before we forget in case where the
+		 * resolver process removes the FdwXact entry and other backend reuses
+		 * it before we forget. So we need to check if the entries are still
+		 * associated with the transaction.  Also we do these check by
+		 * transaction id because these foreign transaction may already be
+		 * held by the resolver.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->valid && fdwxact->held_by == MyBackendId)
+		{
+			fdwxact->held_by = InvalidBackendId;
+
+			if (mark_indoubt)
+			{
+				fdwxact->indoubt = true;	/* let resolver to process */
+				nlefts++;
+			}
+		}
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (nlefts > 0)
+	{
+		Assert(mark_indoubt);
+		elog(DEBUG1, "left %u foreign transactions in in-doubt status", nlefts);
+		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+	FdwXactParticipants_tmp = NIL;
+	FdwXactLocalXid = InvalidTransactionId;
+}
+
+/*
+ * Resolve foreign transactions at the give indexes. If 'waiter' is not NULL,
+ * we release the waiter after we resolved all of the given foreign transactions
+ * On failure we re-enqueue the waiting backend after incremented the next
+ * resolution time.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		PG_TRY();
+		{
+			FdwXactResolveOneFdwXact(fdwxact);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			if (waiter)
+			{
+				LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+				if (waiter->fdwXactState == FDWXACT_WAITING)
+				{
+					SHMQueueDelete(&(waiter->fdwXactLinks));
+					pg_write_barrier();
+					waiter->fdwXactNextResolutionTs =
+						TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+													foreign_xact_resolution_retry_interval);
+					FdwXactQueueInsert(waiter);
+				}
+				LWLockRelease(FdwXactResolutionLock);
+			}
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	if (!waiter)
+		return;
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter could
+	 * already be detached if user cancelled to wait before resolution.
+	 */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid,
+					  false);
+	LWLockRelease(FdwXactLock);
+
+	return (idx != -1);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Determine whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactDetermineTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted. This should not happen except for one case
+	 * where the local transaction is prepared and this foreign transaction is
+	 * being resolved manually using by pg_resolve_foreign_xact(). Raise an
+	 * error anyway since we cannot determine the fate of this foreign
+	 * transaction according to the local transaction whose fate is also not
+	 * determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/*
+ * Commit or rollback one prepared foreign transaction.  After resolved
+ * successfully, the FdwXact entry is removed from the shared memory and also
+ * remove the corresponding on-disk file.
+ */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->held_by != InvalidBackendId || fdwxact->inprocessing);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactDetermineTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare resolution state to pass to API */
+	state.xid = fdwxact->local_xid;
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/*
+ * Return a list of FdwXact matched to given arguments. Otherwise return NIL.
+ * The search condition is defined by arguments with valid values for
+ * respective datatypes. 'include_indoubt' and 'include_in_progress' are the
+ * option for that the result includes in-doubt transactions and in-progress
+ * transactions respectively.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid, bool hold)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		bool		inprocessing;
+
+		if (!fdwxact->valid)
+			continue;
+
+		SpinLockAcquire(&fdwxact->mutex);
+		inprocessing = fdwxact->inprocessing;
+		SpinLockRelease(&fdwxact->mutex);
+
+		/*
+		 * If we're attempting to hold this entry, skip if it is already held
+		 * or being processed.
+		 */
+		if (hold &&
+			(inprocessing || fdwxact->held_by != InvalidBackendId))
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+
+		if (hold)
+			fdwxact->held_by = MyBackendId;
+
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED and as in-doubt, since we do not know the xact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction that prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->indoubt = true;
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+
+		/*
+		 * If the foreign transaction is part of the prepared local
+		 * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK
+		 * PREPARED can determine the fate of this foreign transaction.
+		 */
+		if (TwoPhaseExists(fdwxact->local_xid))
+		{
+			ereport(DEBUG2,
+					(errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the corresponding local prepared transaction",
+							fdwxact->local_xid, fdwxact->serverid,
+							fdwxact->userid)));
+			fdwxact->indoubt = false;
+		}
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		bool		indoubt;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		indoubt = fdwxact->indoubt;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = BoolGetDatum(indoubt);
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	/* Find and hold the FdwXact entry */
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid, true);
+
+	LWLockRelease(FdwXactLock);
+
+	if (idx < 0)
+	{
+		/* No entry */
+		PG_RETURN_BOOL(false);
+	}
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1, NULL);
+	}
+	PG_CATCH();
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[idx];
+
+		SpinLockAcquire(&fdwxact->mutex);
+		FdwXactCtl->fdwxacts[idx]->held_by = InvalidBackendId;
+		SpinLockRelease(&fdwxact->mutex);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			i;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->local_xid == xid && fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+	{
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	if (fdwxact->inprocessing || fdwxact->held_by != InvalidBackendId)
+	{
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot remove foreign transaction entry which is being processed")));
+	}
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  true);
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..fed2fbcd08
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,560 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+	int			i;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+	int			i;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one non-in-doubt FdwXact entry */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		bool		indoubt;
+		BackendId	held_by;
+
+		if (!fdwxact->valid)
+			continue;
+
+		SpinLockAcquire(&fdwxact->mutex);
+		indoubt = fdwxact->indoubt;
+		held_by = fdwxact->held_by;
+		SpinLockRelease(&fdwxact->mutex);
+
+		if ((indoubt && held_by == InvalidBackendId) ||
+			(!indoubt && held_by != InvalidBackendId))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..b91a2e1e88
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,436 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency  termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int			foreign_xact_resolution_retry_interval;
+int			foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_fdwxacts(PGPROC *waiter);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. We clear that flag from those entries on failure.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* clear inprocessing flags */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->inprocessing = false;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or the queue has
+		 * only waiters that have a future resolution timestamp.
+		 */
+		for (;;)
+		{
+			PGPROC	   *waiter;
+
+			CHECK_FOR_INTERRUPTS();
+
+			LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+
+			waiter = FdwXactGetWaiter(now, &resolutionTs, &waitXid);
+
+			if (!waiter)
+			{
+				/* Not found, break */
+				LWLockRelease(FdwXactResolutionLock);
+				break;
+			}
+
+			/* Hold the waiting foreign transactions */
+			hold_fdwxacts(waiter);
+			Assert(nheld > 0);
+			LWLockRelease(FdwXactResolutionLock);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, waiter);
+			CommitTransactionCommand();
+
+			last_resolution_time = now;
+		}
+
+		/* Hold in-doubt transactions */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, NULL);
+			CommitTransactionCommand();
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		/* There is no waiting backend */
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * FdwXactWaiterExists check.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Mark in-doubt transactions as in-processing.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->held_by == InvalidBackendId && fdwxact->indoubt)
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* hold lock */
+			SpinLockAcquire(&fdwxact->mutex);
+			fdwxact->inprocessing = true;
+			SpinLockRelease(&fdwxact->mutex);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Mark foreign transactions associated with the given waiter's transaction
+ * as in-processing.
+ */
+static void
+hold_fdwxacts(PGPROC *waiter)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->dbid == waiter->databaseId &&
+			fdwxact->local_xid == waiter->fdwXactWaitXid)
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* hold lock */
+			SpinLockAcquire(&fdwxact->mutex);
+			Assert(!fdwxact->indoubt);
+			Assert(fdwxact->held_by = waiter->backendId);
+			fdwxact->inprocessing = true;
+			SpinLockRelease(&fdwxact->mutex);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e1904877fa..2b9e039580 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2196,6 +2226,9 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	XLogRecPtr	recptr;
 	TimestampTz committs = GetCurrentTimestamp();
 	bool		replorigin;
+	bool		need_fdwxact_commit;
+
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Are we using the replication origins feature?  Or, in other words, are
@@ -2266,6 +2299,16 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid, true);
+		FdwXactWaitForResolution(xid);
+	}
 }
 
 /*
@@ -2285,6 +2328,9 @@ RecordTransactionAbortPrepared(TransactionId xid,
 							   const char *gid)
 {
 	XLogRecPtr	recptr;
+	bool		need_fdwxact_commit;
+
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Catch the scenario where we aborted partway through
@@ -2325,6 +2371,16 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be rolled back.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid, false);
+		FdwXactWaitForResolution(xid);
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cd30b62d36..c611fd8b45 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1219,6 +1220,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1227,6 +1229,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1265,12 +1268,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1428,6 +1432,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitForResolution(xid);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2087,6 +2099,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2254,6 +2269,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2341,6 +2357,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2532,6 +2550,9 @@ PrepareTransaction(void)
 	 */
 	PostPrepare_Twophase();
 
+	/* Release held FdwXact entries */
+	PostPrepare_FdwXact();
+
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
@@ -2542,6 +2563,7 @@ PrepareTransaction(void)
 	AtEOXact_Files(true);
 	AtEOXact_ComboCid();
 	AtEOXact_HashTables(true);
+	//AtEOXact_FdwXact(true);
 	/* don't call AtEOXact_PgStat here; we fixed pgstat state above */
 	AtEOXact_Snapshot(true, true);
 	pgstat_report_xact_timestamp(0);
@@ -2751,6 +2773,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXact(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ca09d81b08..eae8c60db3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4599,6 +4600,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6286,6 +6288,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6836,14 +6841,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7045,7 +7051,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7558,6 +7567,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7888,6 +7898,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9183,6 +9196,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9712,8 +9726,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9731,6 +9747,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9747,6 +9764,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9952,6 +9970,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10151,6 +10170,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 56420bbc9d..56af9e6408 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..a1dea253c2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2807,8 +2807,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index a399ab4de9..8f2f7041e8 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1422,6 +1436,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1575,6 +1598,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..3fa8bfe09f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -939,7 +940,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..29f376e48c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1ec07bad07..e5dee94764 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -2418,6 +2420,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2258424e81 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d7f99d9944..84bb1913f3 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3667,6 +3667,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3777,6 +3783,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATE:
 			event_name = "HashBatchAllocate";
 			break;
@@ -4103,6 +4112,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 160afe9f39..6a83f19e24 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -973,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..9d34817f39 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 3c2b369615..56c43cf741 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -94,6 +94,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -249,6 +251,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1311,6 +1314,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1376,6 +1380,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1425,6 +1430,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3125,6 +3139,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6985e8eed..241b099238 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 XactTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index f5eef6fa4e..9bd1e1791a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8958ec8103..5ed6c05b18 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2f3e0a70e0..313bf33324 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -426,6 +427,24 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -763,6 +782,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2471,6 +2494,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4599,6 +4668,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 81055edde7..03de9fbc81 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -344,6 +346,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4ff0c6c700..b3f4f8fe18 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e73639df74..3041c39bc0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 233441837f..b040202043 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..5df32b2703
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,162 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is being
+								 * committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is being
+								 * aborted */
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	FdwXactStatus status;
+	bool		indoubt;		/* Is an in-doubt transaction? */
+	bool		inprocessing;	/* resolver is processing? */
+	slock_t		mutex;			/* protect above three fields */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	held_by;		/* backend who are holding */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	TransactionId xid;
+
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void FdwXactReleaseWaiter(PGPROC *waiter);
+extern void FdwXactWaitForResolution(TransactionId wait_xid);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter);
+extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+								TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern bool PrepareFdwXactParticipants(TransactionId xid);
+extern void SetFdwXactParticipants(TransactionId xid, bool commit);
+extern void PreCommit_FdwXact(void);
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void AtPrepare_FdwXact(void);
+extern void PostPrepare_FdwXact(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index a04fc70326..6f1f336e31 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c8869d5226..da0d442f1b 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..df5189dd2d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5981,6 +5981,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,status,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6099,6 +6117,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c55dc1481c..2186c1c5d0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -806,6 +806,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,6 +855,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_FDWXACT_RESOLUTION,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
 	WAIT_EVENT_HASH_BATCH_LOAD,
@@ -969,6 +972,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1ee9000b2b..4150d8a3e4 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -154,6 +155,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 454c2df487..f977ca43d4 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e32215..628eaf531e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1342,6 +1342,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.status,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(xid, serverid, userid, status, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.23.0

v22-0003-Documentation-update.patchapplication/octet-stream; name=v22-0003-Documentation-update.patchDownload
From d2156de537f7b2cb1d01a6a1f5a58c73017b00d0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v22 3/6] Documentation update.

---
 doc/src/sgml/catalogs.sgml                | 136 +++++++++++++
 doc/src/sgml/config.sgml                  | 144 +++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 143 +++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 236 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  89 ++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 798 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 700271fd40..2af2eda512 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9223,6 +9223,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -10934,6 +10939,137 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-doubt status.
+       A foreign transaction becomes in-doubt status when user canceled the
+       query during transaction commit or the server crashed during transaction
+       commit.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 31b4660160..918fac967c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9085,6 +9085,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-managements"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency among all servers that involved in the distributed
+          transaction when some foreign server crashes during committing the
+          distributed transaction.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..139dd7f918
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,143 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-managements"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname> distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If all preparations on foreign servers got
+       successful go to the next step. Any failure happens in this step,
+       the server changes to rollback, then rollback all transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally. The server commits transaction locally.  Any failure happens
+       in this step the server changes to rollback, then rollback all transactions
+       on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Pprepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is normally performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    Each commit of a distributed transaction will wait until confirmation is
+    received that all prepared transactions are committed or rolled back.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, foreign transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the local node crashed during either preparing or
+    resolving foreign transaction and where user canceled the query. You can
+    check in-doubt transaction in <xref linkend="view-pg-foreign-xacts"/>
+    view. These foreign transactions are resolved by foreign transaction resolver
+    process or executing <function>pg_resolve_foriegn_xact</function> function
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving both foreign transactions that are prepared by
+    online transactions and in-doubt transactions. They commit or rollback
+    prepared transactions on all foreign servers involved with the distributed
+    transaction if the local node received agreement messages from all
+    foreign servers during the first step of two-phase commit protocol.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on which one database connecting. On failure during resolution, they retry to
+    resolve at an interval of <varname>foreign_transaction_resolution_interval</varname>
+    time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> feature such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..dd0358ef22 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>
+
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     <productname>PostgreSQL</productname> server crashed during preparing or
+     committing the foreign tranasction. Therefore, this function needs to
+     tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit And Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-managements"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 68179f71cd..1ab8e80fdc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c06afd3ea..e281bd33d8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26126,6 +26126,95 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry without resolution.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 89662cc0a3..82c94d5e5d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1052,6 +1052,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1273,6 +1285,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1550,6 +1574,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1861,6 +1890,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign trasnaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index c41ce9499b..5ef1f4a329 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -170,6 +170,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index ea08d0b614..58f1e4fd15 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v22-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v22-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 4c3b45ea7b027bf734bbf113d41de0a100dda49a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 14:12:17 +0500
Subject: [PATCH v22 1/6] Keep track of writing on non-temporary relation

---
 src/backend/executor/nodeModifyTable.c | 16 ++++++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 22 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..1ec07bad07 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -581,6 +581,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -619,6 +623,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -970,6 +978,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1482,6 +1494,10 @@ lreplace:;
 	if (canSetTag)
 		(estate->es_processed)++;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/* AFTER ROW UPDATE Triggers */
 	ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, slot,
 						 recheckIndexes,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7ee04babc2..a04fc70326 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.23.0

#51Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#50)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 5, 2020 at 3:16 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 4 Jun 2020 at 12:46, Amit Kapila <amit.kapila16@gmail.com> wrote:

+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>

This is written w.r.t foreign_twophase_commit. If one changes this
between prepare and commit, will it have any impact?

Since the distributed transaction commit automatically uses 2pc when
executing COMMIT, it's not possible to change foreign_twophase_commit
between prepare and commit. So I'd like to explain the case where a
user executes PREPARE and then COMMIT PREPARED while changing
foreign_twophase_commit.

PREPARE can run only when foreign_twophase_commit is 'required' (or
'prefer') and all foreign servers involved with the transaction
support 2pc. We prepare all foreign transactions no matter what the
number of servers and modified or not. If either
foreign_twophase_commit is 'disabled' or the transaction modifies data
on a foreign server that doesn't support 2pc, it raises an error. At
COMMIT (or ROLLBACK) PREPARED, similarly foreign_twophase_commit needs
to be set to 'required'. It raises an error if the distributed
transaction has a foreign transaction and foreign_twophase_commit is
'disabled'.

So, IIUC, it will raise an error if foreign_twophase_commit is
'disabled' (or one of the foreign server involved doesn't support 2PC)
and the error can be raised both when user issues PREPARE or COMMIT
(or ROLLBACK) PREPARED. If so, isn't it strange that we raise such an
error after PREPARE? What kind of use-case required this?

4.
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is
in-doubt status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>

It would be better if you can add an additional sentence to say when
and or how can foreign transactions reach in-doubt state.

+       If <literal>true</literal> this foreign transaction is in-doubt status.
+       A foreign transaction becomes in-doubt status when user canceled the
+       query during transaction commit or the server crashed during transaction
+       commit.

Can we reword the second sentence as: "A foreign transaction can have
this status when the user has cancelled the statement or the server
crashes during transaction commit."? I have another question about
this field, why can't it be one of the status ('preparing',
'prepared', 'committing', 'aborting', 'in-doubt') rather than having a
separate field? Also, isn't it more suitable to name 'status' field
as 'state' because these appear to be more like different states of
transaction?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#52Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#51)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 5, 2020 at 3:16 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 4 Jun 2020 at 12:46, Amit Kapila <amit.kapila16@gmail.com> wrote:

+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>

This is written w.r.t foreign_twophase_commit. If one changes this
between prepare and commit, will it have any impact?

Since the distributed transaction commit automatically uses 2pc when
executing COMMIT, it's not possible to change foreign_twophase_commit
between prepare and commit. So I'd like to explain the case where a
user executes PREPARE and then COMMIT PREPARED while changing
foreign_twophase_commit.

PREPARE can run only when foreign_twophase_commit is 'required' (or
'prefer') and all foreign servers involved with the transaction
support 2pc. We prepare all foreign transactions no matter what the
number of servers and modified or not. If either
foreign_twophase_commit is 'disabled' or the transaction modifies data
on a foreign server that doesn't support 2pc, it raises an error. At
COMMIT (or ROLLBACK) PREPARED, similarly foreign_twophase_commit needs
to be set to 'required'. It raises an error if the distributed
transaction has a foreign transaction and foreign_twophase_commit is
'disabled'.

So, IIUC, it will raise an error if foreign_twophase_commit is
'disabled' (or one of the foreign server involved doesn't support 2PC)
and the error can be raised both when user issues PREPARE or COMMIT
(or ROLLBACK) PREPARED. If so, isn't it strange that we raise such an
error after PREPARE? What kind of use-case required this?

I don’t concrete use-case but the reason why it raises an error when a
user setting foreign_twophase_commit to 'disabled' executes COMMIT (or
ROLLBACK) PREPARED within the transaction involving at least one
foreign server is that I wanted to make it behaves in a similar way of
COMMIT case. I mean, if a user executes just COMMIT, the distributed
transaction is committed in two phases but the value of
foreign_twophase_commit is not changed during these two phases. So I
wanted to require user to set foreign_twophase_commit to ‘required’
both when executing PREPARE and executing COMMIT (or ROLLBACK)
PREPARED. Implementation also can become simple because we can assume
that foreign_twophase_commit is always enabled when a transaction
requires foreign transaction preparation and resolution.

4.
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is
in-doubt status and
+       needs to be resolved by calling <function>pg_resolve_fdwxact</function>
+       function.
+      </entry>

It would be better if you can add an additional sentence to say when
and or how can foreign transactions reach in-doubt state.

+       If <literal>true</literal> this foreign transaction is in-doubt status.
+       A foreign transaction becomes in-doubt status when user canceled the
+       query during transaction commit or the server crashed during transaction
+       commit.

Can we reword the second sentence as: "A foreign transaction can have
this status when the user has cancelled the statement or the server
crashes during transaction commit."?

Agreed. Updated in my local branch.

I have another question about
this field, why can't it be one of the status ('preparing',
'prepared', 'committing', 'aborting', 'in-doubt') rather than having a
separate field?

Because I'm using in-doubt field also for checking if the foreign
transaction entry can also be resolved manually, i.g.
pg_resolve_foreign_xact(). For instance, a foreign transaction which
status = 'prepared' and in-doubt = 'true' can be resolved either
foreign transaction resolver or pg_resolve_foreign_xact(). When a user
execute pg_resolve_foreign_xact() against the foreign transaction, it
sets status = 'committing' (or 'rollbacking') by checking transaction
status in clog. The user might cancel pg_resolve_foreign_xact() during
resolution. In this case, the foreign transaction is still status =
'committing' and in-doubt = 'true'. Then if a foreign transaction
resolver process processes the foreign transaction, it can commit it
without clog looking.

Also, isn't it more suitable to name 'status' field
as 'state' because these appear to be more like different states of
transaction?

Agreed.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#53Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#52)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have another question about
this field, why can't it be one of the status ('preparing',
'prepared', 'committing', 'aborting', 'in-doubt') rather than having a
separate field?

Because I'm using in-doubt field also for checking if the foreign
transaction entry can also be resolved manually, i.g.
pg_resolve_foreign_xact(). For instance, a foreign transaction which
status = 'prepared' and in-doubt = 'true' can be resolved either
foreign transaction resolver or pg_resolve_foreign_xact(). When a user
execute pg_resolve_foreign_xact() against the foreign transaction, it
sets status = 'committing' (or 'rollbacking') by checking transaction
status in clog. The user might cancel pg_resolve_foreign_xact() during
resolution. In this case, the foreign transaction is still status =
'committing' and in-doubt = 'true'. Then if a foreign transaction
resolver process processes the foreign transaction, it can commit it
without clog looking.

I think this is a corner case and it is better to simplify the state
recording of foreign transactions then to save a CLOG lookup.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#54Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#53)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 12 Jun 2020 at 12:40, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have another question about
this field, why can't it be one of the status ('preparing',
'prepared', 'committing', 'aborting', 'in-doubt') rather than having a
separate field?

Because I'm using in-doubt field also for checking if the foreign
transaction entry can also be resolved manually, i.g.
pg_resolve_foreign_xact(). For instance, a foreign transaction which
status = 'prepared' and in-doubt = 'true' can be resolved either
foreign transaction resolver or pg_resolve_foreign_xact(). When a user
execute pg_resolve_foreign_xact() against the foreign transaction, it
sets status = 'committing' (or 'rollbacking') by checking transaction
status in clog. The user might cancel pg_resolve_foreign_xact() during
resolution. In this case, the foreign transaction is still status =
'committing' and in-doubt = 'true'. Then if a foreign transaction
resolver process processes the foreign transaction, it can commit it
without clog looking.

I think this is a corner case and it is better to simplify the state
recording of foreign transactions then to save a CLOG lookup.

The main usage of in-doubt flag is to distinguish between in-doubt
transactions and other transactions that have their waiter (I call
on-line transactions). If one foreign server downs for a long time
after the server crash during distributed transaction commit, foreign
transaction resolver tries to resolve the foreign transaction but
fails because the foreign server doesn’t respond. We’d like to avoid
the situation where a resolver process always picks up that foreign
transaction and other on-online transactions waiting to be resolved
cannot move forward. Therefore, a resolver process prioritizes online
transactions. Once the shmem queue having on-line transactions becomes
empty, a resolver process looks at the array of foreign transaction
state to get in-doubt transactions to resolve. I think we should not
process both in-doubt transactions and on-line transactions in the
same way.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#55Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#54)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 12, 2020 at 9:54 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 12 Jun 2020 at 12:40, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have another question about
this field, why can't it be one of the status ('preparing',
'prepared', 'committing', 'aborting', 'in-doubt') rather than having a
separate field?

Because I'm using in-doubt field also for checking if the foreign
transaction entry can also be resolved manually, i.g.
pg_resolve_foreign_xact(). For instance, a foreign transaction which
status = 'prepared' and in-doubt = 'true' can be resolved either
foreign transaction resolver or pg_resolve_foreign_xact(). When a user
execute pg_resolve_foreign_xact() against the foreign transaction, it
sets status = 'committing' (or 'rollbacking') by checking transaction
status in clog. The user might cancel pg_resolve_foreign_xact() during
resolution. In this case, the foreign transaction is still status =
'committing' and in-doubt = 'true'. Then if a foreign transaction
resolver process processes the foreign transaction, it can commit it
without clog looking.

I think this is a corner case and it is better to simplify the state
recording of foreign transactions then to save a CLOG lookup.

The main usage of in-doubt flag is to distinguish between in-doubt
transactions and other transactions that have their waiter (I call
on-line transactions).

Which are these other online transactions? I had assumed that foreign
transaction resolver process is to resolve in-doubt transactions but
it seems it is also used for some other purpose which anyway was the
next question I had while reviewing other sections of docs but let's
clarify as it came up now.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#56Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#55)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 12 Jun 2020 at 15:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 12, 2020 at 9:54 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 12 Jun 2020 at 12:40, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have another question about
this field, why can't it be one of the status ('preparing',
'prepared', 'committing', 'aborting', 'in-doubt') rather than having a
separate field?

Because I'm using in-doubt field also for checking if the foreign
transaction entry can also be resolved manually, i.g.
pg_resolve_foreign_xact(). For instance, a foreign transaction which
status = 'prepared' and in-doubt = 'true' can be resolved either
foreign transaction resolver or pg_resolve_foreign_xact(). When a user
execute pg_resolve_foreign_xact() against the foreign transaction, it
sets status = 'committing' (or 'rollbacking') by checking transaction
status in clog. The user might cancel pg_resolve_foreign_xact() during
resolution. In this case, the foreign transaction is still status =
'committing' and in-doubt = 'true'. Then if a foreign transaction
resolver process processes the foreign transaction, it can commit it
without clog looking.

I think this is a corner case and it is better to simplify the state
recording of foreign transactions then to save a CLOG lookup.

The main usage of in-doubt flag is to distinguish between in-doubt
transactions and other transactions that have their waiter (I call
on-line transactions).

Which are these other online transactions? I had assumed that foreign
transaction resolver process is to resolve in-doubt transactions but
it seems it is also used for some other purpose which anyway was the
next question I had while reviewing other sections of docs but let's
clarify as it came up now.

When a distributed transaction is committed by COMMIT command, the
postgres backend process prepare all foreign transaction and commit
the local transaction. Then the backend enqueue itself to the shmem
queue, asks a resolver process for committing the prepared foreign
transaction, and wait. That is, these prepared foreign transactions
are committed by the resolver process, not backend process. Once the
resolver process committed all prepared foreign transactions, it wakes
the waiting backend process. I meant this kind of transaction is
on-line transactions. This procedure is similar to what synchronous
replication does.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#57Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#56)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 12, 2020 at 2:10 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 12 Jun 2020 at 15:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think this is a corner case and it is better to simplify the state
recording of foreign transactions then to save a CLOG lookup.

The main usage of in-doubt flag is to distinguish between in-doubt
transactions and other transactions that have their waiter (I call
on-line transactions).

Which are these other online transactions? I had assumed that foreign
transaction resolver process is to resolve in-doubt transactions but
it seems it is also used for some other purpose which anyway was the
next question I had while reviewing other sections of docs but let's
clarify as it came up now.

When a distributed transaction is committed by COMMIT command, the
postgres backend process prepare all foreign transaction and commit
the local transaction.

Does this mean that we will mark the xid as committed in CLOG of the
local server? If so, why is this okay till we commit transactions in
all the foreign servers, what if we fail to commit on one of the
servers?

Few more comments on v22-0003-Documentation-update
--------------------------------------------------------------------------------------
1.
+          When <literal>disabled</literal> there can be risk of database
+          consistency among all servers that involved in the distributed
+          transaction when some foreign server crashes during committing the
+          distributed transaction.

Will it read better if rephrase above to something like: "When
<literal>disabled</literal> there can be a risk of database
consistency if one or more foreign servers crashes while committing
the distributed transaction."?

2.
+      <varlistentry
id="guc-foreign-transaction-resolution-rety-interval"
xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname>
(<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname>
configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should
wait when the last resolution
+         fails before retrying to resolve foreign transaction. This
parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server
command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>

Typo. <varlistentry
id="guc-foreign-transaction-resolution-rety-interval", spelling of
retry is wrong. Do we really need such a guc parameter? I think we
can come up with some simple algorithm to retry after a few seconds
and then increase that interval of retry if we fail again or something
like that. I don't know how users can come up with some non-default
value for this variable.

3
+      <varlistentry id="guc-foreign-transaction-resolver-timeout"
xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname>
(<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname>
configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't
have any foreign
+         transactions to resolve longer than the specified number of
milliseconds.
+         A value of zero disables the timeout mechanism, meaning it
connects to one
+         database until stopping manually.

Can we mention the function name using which one can stop the resolver process?

4.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines

Can we slightly rephase this "Using the PostgreSQL's atomic commit
ensures that all the changes on foreign servers are either committed
or rolled back using the transaction callback routines"?

5.
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname> distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>.

/PostgreSQL/PostgreSQL's.

If all preparations on foreign servers got
+ successful go to the next step.

How about "If the prepare on all foreign servers is successful then go
to the next step"?

 Any failure happens in this step,
+       the server changes to rollback, then rollback all transactions on both
+       local and foreign servers.

Can we rephrase this line to something like: "If there is any failure
in the prepare phase, the server will rollback all the transactions on
both local and foreign servers."?

What if the issued Rollback also failed, say due to network breakdown
between local and one of foreign servers? Shouldn't such a
transaction be 'in-doubt' state?

6.
+      <para>
+       Commit locally. The server commits transaction locally.  Any
failure happens
+       in this step the server changes to rollback, then rollback all
transactions
+       on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Pprepared
transactions
+       are committed or rolled back according to the result of the
local transaction.
+       This step is normally performed by a foreign transaction
resolver process.
+      </para>

When (in which step) do we commit on foreign servers? Do Resolver
processes commit on foreign servers, if so, how can we commit locally
without committing on foreign servers, what if the commit on one of
the servers fails? It is not very clear to me from the steps mentioned
here? Typo, /Pprepared/Prepared

7.
However, foreign transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the local node crashed during either preparing or
+    resolving foreign transaction and where user canceled the query.

Here the three cases are not very clear. You might want to use (a)
..., (b) .. ,(c).. Also, I think the state will be in-doubt even when
we lost connection to server during commit or rollback.

8.
+    One foreign transaction resolver is responsible for transaction resolutions
+    on which one database connecting.

Can we rephrase it to: "One foreign transaction resolver is
responsible for transaction resolutions on the database to which it is
connected."?

9.
+    Note that other <productname>PostgreSQL</productname> feature
such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.

/feature/features

10.
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref
linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be
non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need
to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> +
<literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> feature
such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>

Don't we need to mention foreign_twophase_commit GUC here?

11.
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>

Managements/Management?

12.
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction.

Lets write the above sentence as: "Transaction management callbacks
are used to commit, rollback and prepare the foreign transaction."

13.
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>

What exact functionality a FDW can accomplish if it just supports
CommitForeignTransaction and RollbackForeignTransaction? It seems it
doesn't care for 2PC, if so, is there any special functionality we can
achieve with this which we can't do without these APIs?

14.
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is
called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>

/distribute/distributed

15.
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls
<literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the
post-rollback
+     phase.
+    </para>

There is no reasoning mentioned as to why CommitForeignTransaction has
to be called in pre-commit phase and RollbackForeignTransaction in
post-rollback phase? Basically why one in pre phase and other in post
phase?

16.
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter>
<type>xid</type>, <parameter>serverid</parameter> <type>oid</type>,
<parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as
<function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry
without resolution.
+       </entry>

Can we write why and when such a function can be used? Typo,
/trasnaction/transaction

17.
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign trasnaction
+       resolution.</entry>
+     </row>

/trasnaction/transaction

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#58Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#57)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 12, 2020 at 2:10 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 12 Jun 2020 at 15:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think this is a corner case and it is better to simplify the state
recording of foreign transactions then to save a CLOG lookup.

The main usage of in-doubt flag is to distinguish between in-doubt
transactions and other transactions that have their waiter (I call
on-line transactions).

Which are these other online transactions? I had assumed that foreign
transaction resolver process is to resolve in-doubt transactions but
it seems it is also used for some other purpose which anyway was the
next question I had while reviewing other sections of docs but let's
clarify as it came up now.

When a distributed transaction is committed by COMMIT command, the
postgres backend process prepare all foreign transaction and commit
the local transaction.

Thank you for your review comments! Let me answer your question first.
I'll see the review comments.

Does this mean that we will mark the xid as committed in CLOG of the
local server?

Well what I meant is that when the client executes COMMIT command, the
backend executes PREPARE TRANSACTION command on all involved foreign
servers and then marks the xid as committed in clog in the local
server.

If so, why is this okay till we commit transactions in
all the foreign servers, what if we fail to commit on one of the
servers?

Once the local transaction is committed, all involved foreign
transactions never be rolled back. The backend already prepared all
foreign transaction before local commit, committing prepared foreign
transaction basically doesn't fail. But even if it fails for whatever
reason, we never rollback the all prepared foreign transactions. A
resolver tries to commit foreign transactions at certain intervals.
Does it answer your question?

Regard,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#59Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#58)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 12, 2020 at 6:24 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Which are these other online transactions? I had assumed that foreign
transaction resolver process is to resolve in-doubt transactions but
it seems it is also used for some other purpose which anyway was the
next question I had while reviewing other sections of docs but let's
clarify as it came up now.

When a distributed transaction is committed by COMMIT command, the
postgres backend process prepare all foreign transaction and commit
the local transaction.

Thank you for your review comments! Let me answer your question first.
I'll see the review comments.

Does this mean that we will mark the xid as committed in CLOG of the
local server?

Well what I meant is that when the client executes COMMIT command, the
backend executes PREPARE TRANSACTION command on all involved foreign
servers and then marks the xid as committed in clog in the local
server.

Won't it create an inconsistency in viewing the data from the
different servers? Say, such a transaction inserts one row into a
local server and another into the foreign server. Now, if we follow
the above protocol, the user will be able to see the row from the
local server but not from the foreign server.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#60Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#59)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sat, 13 Jun 2020 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 12, 2020 at 6:24 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Which are these other online transactions? I had assumed that foreign
transaction resolver process is to resolve in-doubt transactions but
it seems it is also used for some other purpose which anyway was the
next question I had while reviewing other sections of docs but let's
clarify as it came up now.

When a distributed transaction is committed by COMMIT command, the
postgres backend process prepare all foreign transaction and commit
the local transaction.

Thank you for your review comments! Let me answer your question first.
I'll see the review comments.

Does this mean that we will mark the xid as committed in CLOG of the
local server?

Well what I meant is that when the client executes COMMIT command, the
backend executes PREPARE TRANSACTION command on all involved foreign
servers and then marks the xid as committed in clog in the local
server.

Won't it create an inconsistency in viewing the data from the
different servers? Say, such a transaction inserts one row into a
local server and another into the foreign server. Now, if we follow
the above protocol, the user will be able to see the row from the
local server but not from the foreign server.

Yes, you're right. This atomic commit feature doesn't guarantee such
consistent visibility so-called atomic visibility. Even the local
server is not modified, since a resolver process commits prepared
foreign transactions one by one another user could see an inconsistent
result. Providing globally consistent snapshots to transactions
involving foreign servers is one of the solutions.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#61Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Masahiko Sawada (#60)
Re: Transactions involving multiple postgres foreign servers, take 2

Won't it create an inconsistency in viewing the data from the
different servers? Say, such a transaction inserts one row into a
local server and another into the foreign server. Now, if we follow
the above protocol, the user will be able to see the row from the
local server but not from the foreign server.

Yes, you're right. This atomic commit feature doesn't guarantee such
consistent visibility so-called atomic visibility. Even the local
server is not modified, since a resolver process commits prepared
foreign transactions one by one another user could see an inconsistent
result. Providing globally consistent snapshots to transactions
involving foreign servers is one of the solutions.

Another approach to the atomic visibility problem is to control
snapshot acquisition timing and commit timing (plus using REPEATABLE
READ). In the REPEATABLE READ transaction isolation level, PostgreSQL
assigns a snapshot at the time when the first command is executed in a
transaction. If we could prevent any commit while any transaction is
acquiring snapshot, and we could prevent any snapshot acquisition while
committing, visibility inconsistency which Amit explained can be
avoided.

This approach was proposed in a academic paper [1]http://www.vldb.org/pvldb/vol2/vldb09-694.pdf.

Good point with the approach is, we don't need to modify PostgreSQL at
all.

Downside of the approach is, we need someone who controls the timings
(in [1]http://www.vldb.org/pvldb/vol2/vldb09-694.pdf, a middleware called "Pangea" was proposed). Also we need to
limit the transaction isolation level to REPEATABLE READ.

[1]: http://www.vldb.org/pvldb/vol2/vldb09-694.pdf

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#62Amit Kapila
amit.kapila16@gmail.com
In reply to: Tatsuo Ishii (#61)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sun, Jun 14, 2020 at 2:21 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Won't it create an inconsistency in viewing the data from the
different servers? Say, such a transaction inserts one row into a
local server and another into the foreign server. Now, if we follow
the above protocol, the user will be able to see the row from the
local server but not from the foreign server.

Yes, you're right. This atomic commit feature doesn't guarantee such
consistent visibility so-called atomic visibility.

Okay, I understand that the purpose of this feature is to provide
atomic commit which means the transaction on all servers involved will
either commit or rollback. However, I think we should at least see at
a high level how the visibility will work because it might influence
the implementation of this feature.

Even the local
server is not modified, since a resolver process commits prepared
foreign transactions one by one another user could see an inconsistent
result. Providing globally consistent snapshots to transactions
involving foreign servers is one of the solutions.

How would it be able to do that? Say, when it decides to take a
snapshot the transaction on the foreign server appears to be committed
but the transaction on the local server won't appear to be committed,
so the consistent data visibility problem as mentioned above could
still arise.

Another approach to the atomic visibility problem is to control
snapshot acquisition timing and commit timing (plus using REPEATABLE
READ). In the REPEATABLE READ transaction isolation level, PostgreSQL
assigns a snapshot at the time when the first command is executed in a
transaction. If we could prevent any commit while any transaction is
acquiring snapshot, and we could prevent any snapshot acquisition while
committing, visibility inconsistency which Amit explained can be
avoided.

I think the problem mentioned above can occur with this as well or if
I am missing something then can you explain in further detail how it
won't create problem in the scenario I have used above?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#63Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Amit Kapila (#62)
Re: Transactions involving multiple postgres foreign servers, take 2

Another approach to the atomic visibility problem is to control
snapshot acquisition timing and commit timing (plus using REPEATABLE
READ). In the REPEATABLE READ transaction isolation level, PostgreSQL
assigns a snapshot at the time when the first command is executed in a
transaction. If we could prevent any commit while any transaction is
acquiring snapshot, and we could prevent any snapshot acquisition while
committing, visibility inconsistency which Amit explained can be
avoided.

I think the problem mentioned above can occur with this as well or if
I am missing something then can you explain in further detail how it
won't create problem in the scenario I have used above?

So the problem you mentioned above is like this? (S1/S2 denotes
transactions (sessions), N1/N2 is the postgreSQL servers). Since S1
already committed on N1, S2 sees the row on N1. However S2 does not
see the row on N2 since S1 has not committed on N2 yet.

S1/N1: DROP TABLE t1;
DROP TABLE
S1/N1: CREATE TABLE t1(i int);
CREATE TABLE
S1/N2: DROP TABLE t1;
DROP TABLE
S1/N2: CREATE TABLE t1(i int);
CREATE TABLE
S1/N1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
S1/N2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
S2/N1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
S1/N1: INSERT INTO t1 VALUES (1);
INSERT 0 1
S1/N2: INSERT INTO t1 VALUES (1);
INSERT 0 1
S1/N1: PREPARE TRANSACTION 's1n1';
PREPARE TRANSACTION
S1/N2: PREPARE TRANSACTION 's1n2';
PREPARE TRANSACTION
S2/N1: PREPARE TRANSACTION 's2n1';
PREPARE TRANSACTION
S1/N1: COMMIT PREPARED 's1n1';
COMMIT PREPARED
S2/N1: SELECT * FROM t1; -- see the row
i
---
1
(1 row)

S2/N2: SELECT * FROM t1; -- doesn't see the row
i
---
(0 rows)

S1/N2: COMMIT PREPARED 's1n2';
COMMIT PREPARED
S2/N1: COMMIT PREPARED 's2n1';
COMMIT PREPARED

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#64Amit Kapila
amit.kapila16@gmail.com
In reply to: Tatsuo Ishii (#63)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, Jun 15, 2020 at 12:30 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Another approach to the atomic visibility problem is to control
snapshot acquisition timing and commit timing (plus using REPEATABLE
READ). In the REPEATABLE READ transaction isolation level, PostgreSQL
assigns a snapshot at the time when the first command is executed in a
transaction. If we could prevent any commit while any transaction is
acquiring snapshot, and we could prevent any snapshot acquisition while
committing, visibility inconsistency which Amit explained can be
avoided.

I think the problem mentioned above can occur with this as well or if
I am missing something then can you explain in further detail how it
won't create problem in the scenario I have used above?

So the problem you mentioned above is like this? (S1/S2 denotes
transactions (sessions), N1/N2 is the postgreSQL servers). Since S1
already committed on N1, S2 sees the row on N1. However S2 does not
see the row on N2 since S1 has not committed on N2 yet.

Yeah, something on these lines but S2 can execute the query on N1
directly which should fetch the data from both N1 and N2. Even if
there is a solution using REPEATABLE READ isolation level we might not
prefer to use that as the only level for distributed transactions, it
might be too costly but let us first see how does it solve the
problem?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#65Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#62)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, 15 Jun 2020 at 15:20, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Jun 14, 2020 at 2:21 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Won't it create an inconsistency in viewing the data from the
different servers? Say, such a transaction inserts one row into a
local server and another into the foreign server. Now, if we follow
the above protocol, the user will be able to see the row from the
local server but not from the foreign server.

Yes, you're right. This atomic commit feature doesn't guarantee such
consistent visibility so-called atomic visibility.

Okay, I understand that the purpose of this feature is to provide
atomic commit which means the transaction on all servers involved will
either commit or rollback. However, I think we should at least see at
a high level how the visibility will work because it might influence
the implementation of this feature.

Even the local
server is not modified, since a resolver process commits prepared
foreign transactions one by one another user could see an inconsistent
result. Providing globally consistent snapshots to transactions
involving foreign servers is one of the solutions.

How would it be able to do that? Say, when it decides to take a
snapshot the transaction on the foreign server appears to be committed
but the transaction on the local server won't appear to be committed,
so the consistent data visibility problem as mentioned above could
still arise.

There are many solutions. For instance, in Postgres-XC/X2 (and maybe
XL), there is a GTM node that is responsible for providing global
transaction IDs (GXID) and globally consistent snapshots. All
transactions need to access GTM when checking the distributed
transaction status as well as starting transactions and ending
transactions. IIUC if a global transaction accesses a tuple whose GXID
is included in its global snapshot it waits for that transaction to be
committed or rolled back.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#66Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#65)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, Jun 15, 2020 at 7:06 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Mon, 15 Jun 2020 at 15:20, Amit Kapila <amit.kapila16@gmail.com> wrote:

Even the local
server is not modified, since a resolver process commits prepared
foreign transactions one by one another user could see an inconsistent
result. Providing globally consistent snapshots to transactions
involving foreign servers is one of the solutions.

How would it be able to do that? Say, when it decides to take a
snapshot the transaction on the foreign server appears to be committed
but the transaction on the local server won't appear to be committed,
so the consistent data visibility problem as mentioned above could
still arise.

There are many solutions. For instance, in Postgres-XC/X2 (and maybe
XL), there is a GTM node that is responsible for providing global
transaction IDs (GXID) and globally consistent snapshots. All
transactions need to access GTM when checking the distributed
transaction status as well as starting transactions and ending
transactions. IIUC if a global transaction accesses a tuple whose GXID
is included in its global snapshot it waits for that transaction to be
committed or rolled back.

Is there some mapping between GXID and XIDs allocated for each node or
will each node use the GXID as XID to modify the data? Are we fine
with parking the work for global snapshots and atomic visibility to a
separate patch and just proceed with the design proposed by this
patch? I am asking because I thought there might be some impact on
the design of this patch based on what we decide for that work.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#67Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Amit Kapila (#66)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 16, 2020 at 3:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 15, 2020 at 7:06 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Mon, 15 Jun 2020 at 15:20, Amit Kapila <amit.kapila16@gmail.com> wrote:

Even the local
server is not modified, since a resolver process commits prepared
foreign transactions one by one another user could see an inconsistent
result. Providing globally consistent snapshots to transactions
involving foreign servers is one of the solutions.

How would it be able to do that? Say, when it decides to take a
snapshot the transaction on the foreign server appears to be committed
but the transaction on the local server won't appear to be committed,
so the consistent data visibility problem as mentioned above could
still arise.

There are many solutions. For instance, in Postgres-XC/X2 (and maybe
XL), there is a GTM node that is responsible for providing global
transaction IDs (GXID) and globally consistent snapshots. All
transactions need to access GTM when checking the distributed
transaction status as well as starting transactions and ending
transactions. IIUC if a global transaction accesses a tuple whose GXID
is included in its global snapshot it waits for that transaction to be
committed or rolled back.

Is there some mapping between GXID and XIDs allocated for each node or
will each node use the GXID as XID to modify the data? Are we fine
with parking the work for global snapshots and atomic visibility to a
separate patch and just proceed with the design proposed by this
patch?

Distributed transaction involves, atomic commit, atomic visibility
and global consistency. 2PC is the only practical solution for atomic
commit. There are some improvements over 2PC but those are add ons to
the basic 2PC, which is what this patch provides. Atomic visibility
and global consistency however have alternative solutions but all of
those solutions require 2PC to be supported. Each of those are large
pieces of work and trying to get everything in may not work. Once we
have basic 2PC in place, there will be a ground to experiment with
solutions for global consistency and atomic visibility. If we manage
to do it right, we could make it pluggable as well. So, I think we
should concentrate on supporting basic 2PC work now.

I am asking because I thought there might be some impact on
the design of this patch based on what we decide for that work.

Since 2PC is at the heart of any distributed transaction system, the
impact will be low. Figuring all of that, without having basic 2PC,
will be very hard.

--
Best Wishes,
Ashutosh Bapat

#68Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#57)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for your reviews on 0003 patch. I've incorporated your
comments. I'll submit the latest version patch later as the design or
scope might change as a result of the discussion.

Few more comments on v22-0003-Documentation-update
--------------------------------------------------------------------------------------
1.
+          When <literal>disabled</literal> there can be risk of database
+          consistency among all servers that involved in the distributed
+          transaction when some foreign server crashes during committing the
+          distributed transaction.

Will it read better if rephrase above to something like: "When
<literal>disabled</literal> there can be a risk of database
consistency if one or more foreign servers crashes while committing
the distributed transaction."?

Fixed.

2.
+      <varlistentry
id="guc-foreign-transaction-resolution-rety-interval"
xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname>
(<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname>
configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should
wait when the last resolution
+         fails before retrying to resolve foreign transaction. This
parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server
command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>

Typo. <varlistentry
id="guc-foreign-transaction-resolution-rety-interval", spelling of
retry is wrong. Do we really need such a guc parameter? I think we
can come up with some simple algorithm to retry after a few seconds
and then increase that interval of retry if we fail again or something
like that. I don't know how users can come up with some non-default
value for this variable.

For example, in a low-reliable network environment, setting lower
value would help to minimize the backend wait time in case of
connection lost. But I also agree with your point. In terms of
implementation, having backends wait for the fixed time is more simple
but we can do such incremental interval by remembering the retry count
for each foreign transaction.

An open question regarding retrying foreign transaction resolution is
how we process the case where an involved foreign server is down for a
very long. If an online transaction is waiting to be resolved, there
is no way to exit from the wait loop other than either the user sends
a cancel request or the crashed server is restored. But if the foreign
server has to be down for a long time, I think it’s not practical to
send a cancel request because the client would need something like a
timeout mechanism. So I think it might be better to provide a way to
cancel the waiting without the user sending a cancel. For example,
having a timeout or having the limit of the retry count. If an
in-doubt transaction is waiting to be resolved, we keep trying to
resolve the foreign transaction at an interval. But I wonder if the
user might want to disable the automatic in-doubt foreign transaction
in some cases, for example, where the user knows the crashed server
will not be restored for a long time. I’m thinking that we can provide
a way to disable automatic foreign transaction resolution or disable
it for the particular foreign transaction.

3
+      <varlistentry id="guc-foreign-transaction-resolver-timeout"
xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname>
(<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname>
configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't
have any foreign
+         transactions to resolve longer than the specified number of
milliseconds.
+         A value of zero disables the timeout mechanism, meaning it
connects to one
+         database until stopping manually.

Can we mention the function name using which one can stop the resolver process?

Fixed.

4.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers end in either commit or rollback using the
+   transaction callback routines

Can we slightly rephase this "Using the PostgreSQL's atomic commit
ensures that all the changes on foreign servers are either committed
or rolled back using the transaction callback routines"?

Fixed.

5.
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname> distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>.

/PostgreSQL/PostgreSQL's.

Fixed.

If all preparations on foreign servers got
+ successful go to the next step.

How about "If the prepare on all foreign servers is successful then go
to the next step"?

Fixed.

Any failure happens in this step,
+       the server changes to rollback, then rollback all transactions on both
+       local and foreign servers.

Can we rephrase this line to something like: "If there is any failure
in the prepare phase, the server will rollback all the transactions on
both local and foreign servers."?

Fixed.

What if the issued Rollback also failed, say due to network breakdown
between local and one of foreign servers? Shouldn't such a
transaction be 'in-doubt' state?

Rollback API to rollback transaction in one-phase can be called
recursively. So FDWs have to tolerate recursive calling.

In the current patch, all transaction operations are performed
synchronously. That is, foreign transaction never becomes in-doubt
state without explicit cancel by the user or the local node crash.
That way, subsequent transactions can assume that precedent
distributed transactions are already resolved unless the user
canceled.

Let me explain the details:

If the transaction turns rollback due to failure before the local
commit, we attempt to do both ROLLBACK and ROLLBACK PREPARED against
foreign transactions whose status is PREPARING. That is, we end the
foreign transactions by doing ROLLBACK. And since we're not sure
preparation has been completed on the foreign server the backend asks
the resolver process for doing ROLLBACK PREPARED on the foreign
servers. Therefore FDWs have to tolerate OBJECT_NOT_FOUND error in
abort case. Since the backend process returns an acknowledgment to the
client only after rolling back all foreign transactions, these foreign
transactional don't remain as in-doubt state.

If rolling back failed after the local commit (i.g., the client does
ROLLBACK and the resolver failed to do ROLLBACK PREPARED), a resolver
process will relaunch and retry to do ROLLBACK PREPARED. The backend
process waits until ROLLBACK PREPARED is successfully done or the user
cancels. So the foreign transactions don't become in-doubt
transactions.

Synchronousness is also an open question. If we want to support atomic
commit in an asynchronous manner it might be better to implement it
first in terms of complexity. The backend returns an acknowledgment to
the client immediately after asking the resolver process. It’s known
as the early acknowledgment technique. The downside is that the user
who wants to see the result of precedent transaction needs to make
sure the precedent transaction is committed on all foreign servers. We
will also need to think about how to control it by GUC parameter when
we have synchronous distributed transaction commit. Perhaps it’s
better to control it independent of synchronous replication.

6.
+      <para>
+       Commit locally. The server commits transaction locally.  Any
failure happens
+       in this step the server changes to rollback, then rollback all
transactions
+       on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Pprepared
transactions
+       are committed or rolled back according to the result of the
local transaction.
+       This step is normally performed by a foreign transaction
resolver process.
+      </para>

When (in which step) do we commit on foreign servers? Do Resolver
processes commit on foreign servers, if so, how can we commit locally
without committing on foreign servers, what if the commit on one of
the servers fails? It is not very clear to me from the steps mentioned
here?

In case 2pc is required, we commit transactions on foreign servers at
the final step by the resolver process. If the committing a prepared
transaction on one of the servers fails, a resolver process relaunches
after an interval and retry to commit.

In case 2pc is not required, we commit transactions on foreign servers
at pre-commit phase by the backend.

Typo, /Pprepared/Prepared

Fixed.

7.
However, foreign transactions
+    become <firstterm>in-doubt</firstterm> in three cases: where the foreign
+    server crashed or lost the connectibility to it during preparing foreign
+    transaction, where the local node crashed during either preparing or
+    resolving foreign transaction and where user canceled the query.

Here the three cases are not very clear. You might want to use (a)
..., (b) .. ,(c)..

Fixed. I change it to itemizedlist.

Also, I think the state will be in-doubt even when
we lost connection to server during commit or rollback.

Let me correct the cases of the foreign transactions remain as
in-doubt state. There are two cases:

* The local node crashed
* The user canceled the transaction commit or rollback.

Even when we lost connection to the server during commit or rollback
prepared transaction, a backend doesn’t return an acknowledgment to
the client until either transaction is successfully resolved, the user
cancels the transaction, or the local node crashes.

8.
+    One foreign transaction resolver is responsible for transaction resolutions
+    on which one database connecting.

Can we rephrase it to: "One foreign transaction resolver is
responsible for transaction resolutions on the database to which it is
connected."?

Fixed.

9.
+    Note that other <productname>PostgreSQL</productname> feature
such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.

/feature/features

Fixed.

10.
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref
linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be
non-zero value.
+    Additionally the <varname>max_worker_processes</varname> may need
to be adjusted to
+    accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> +
<literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> feature
such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>

Don't we need to mention foreign_twophase_commit GUC here?

Fixed.

11.
+   <sect2 id="fdw-callbacks-transaction-managements">
+    <title>FDW Routines For Transaction Managements</title>

Managements/Management?

Fixed.

12.
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction.

Lets write the above sentence as: "Transaction management callbacks
are used to commit, rollback and prepare the foreign transaction."

Fixed.

13.
+    <para>
+     Transaction management callbacks are used for doing commit, rollback and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>

What exact functionality a FDW can accomplish if it just supports
CommitForeignTransaction and RollbackForeignTransaction? It seems it
doesn't care for 2PC, if so, is there any special functionality we can
achieve with this which we can't do without these APIs?

There is no special functionality even if an FDW implements
CommitForeignTrasnaction and RollbackForeignTransaction. Currently,
since there is no transaction API in FDW APIs, FDW developer has to
use XactCallback to control transactions but there is no
documentation. The idea of allowing an FDW to support only
CommitForeignTrasnaction and RollbackForeignTransaction is that FDW
developers can implement transaction management easily. But in the
first patch, we also can disallow it to make the implementation
simple.

14.
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is
called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distribute transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>

/distribute/distributed

Fixed.

15.
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit And Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls
<literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the
post-rollback
+     phase.
+    </para>

There is no reasoning mentioned as to why CommitForeignTransaction has
to be called in pre-commit phase and RollbackForeignTransaction in
post-rollback phase? Basically why one in pre phase and other in post
phase?

Good point. This behavior just follows what postgres_fdw does. I'm not
sure the exact reason why postgres_fdw commit the transaction in
pre-commit phase but I guess the committing a foreign transaction is
likely to abort comparing to the local commit, it might be better to
do first.

16.
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter>
<type>xid</type>, <parameter>serverid</parameter> <type>oid</type>,
<parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as
<function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transcation entry
without resolution.
+       </entry>

Can we write why and when such a function can be used? Typo,
/trasnaction/transaction

Fixed.

17.
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign trasnaction
+       resolution.</entry>
+     </row>

/trasnaction/transaction

Fixed.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#69Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Amit Kapila (#64)
Re: Transactions involving multiple postgres foreign servers, take 2

I think the problem mentioned above can occur with this as well or if
I am missing something then can you explain in further detail how it
won't create problem in the scenario I have used above?

So the problem you mentioned above is like this? (S1/S2 denotes
transactions (sessions), N1/N2 is the postgreSQL servers). Since S1
already committed on N1, S2 sees the row on N1. However S2 does not
see the row on N2 since S1 has not committed on N2 yet.

Yeah, something on these lines but S2 can execute the query on N1
directly which should fetch the data from both N1 and N2.

The algorythm assumes that any client should access database through a
middle ware. Such direct access is prohibited.

Even if
there is a solution using REPEATABLE READ isolation level we might not
prefer to use that as the only level for distributed transactions, it
might be too costly but let us first see how does it solve the
problem?

The paper extends Snapshot Isolation (SI, which is same as our
REPEATABLE READ isolation level) to "Global Snapshot Isolation", GSI).
I think GSI will solve the problem (atomic visibility) we are
discussing.

Unlike READ COMMITTED, REPEATABLE READ acquires snapshot at the time
when the first command is executed in a transaction (READ COMMITTED
acquires a snapshot at each command in a transaction). Pangea controls
the timing of the snapshot acquisition on pair of transactions
(S1/N1,N2 or S2/N1,N2) so that each pair acquires the same
snapshot. To achieve this, while some transactions are trying to
acquire snapshot, any commit operation should be postponed. Likewise
any snapshot acquisition should wait until any in progress commit
operations are finished (see Algorithm I to III in the paper for more
details). With this rule, the previous example now looks like this:
you can see SELECT on S2/N1 and S2/N2 give the same result.

S1/N1: DROP TABLE t1;
DROP TABLE
S1/N1: CREATE TABLE t1(i int);
CREATE TABLE
S1/N2: DROP TABLE t1;
DROP TABLE
S1/N2: CREATE TABLE t1(i int);
CREATE TABLE
S1/N1: BEGIN;
BEGIN
S1/N2: BEGIN;
BEGIN
S2/N1: BEGIN;
BEGIN
S1/N1: SET transaction_isolation TO 'repeatable read';
SET
S1/N2: SET transaction_isolation TO 'repeatable read';
SET
S2/N1: SET transaction_isolation TO 'repeatable read';
SET
S1/N1: INSERT INTO t1 VALUES (1);
INSERT 0 1
S1/N2: INSERT INTO t1 VALUES (1);
INSERT 0 1
S2/N1: SELECT * FROM t1;
i
---
(0 rows)

S2/N2: SELECT * FROM t1;
i
---
(0 rows)

S1/N1: PREPARE TRANSACTION 's1n1';
PREPARE TRANSACTION
S1/N2: PREPARE TRANSACTION 's1n2';
PREPARE TRANSACTION
S2/N1: PREPARE TRANSACTION 's2n1';
PREPARE TRANSACTION
S1/N1: COMMIT PREPARED 's1n1';
COMMIT PREPARED
S1/N2: COMMIT PREPARED 's1n2';
COMMIT PREPARED
S2/N1: COMMIT PREPARED 's2n1';
COMMIT PREPARED

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#70Bruce Momjian
bruce@momjian.us
In reply to: Ashutosh Bapat (#67)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 16, 2020 at 06:42:52PM +0530, Ashutosh Bapat wrote:

Is there some mapping between GXID and XIDs allocated for each node or
will each node use the GXID as XID to modify the data? Are we fine
with parking the work for global snapshots and atomic visibility to a
separate patch and just proceed with the design proposed by this
patch?

Distributed transaction involves, atomic commit, atomic visibility
and global consistency. 2PC is the only practical solution for atomic
commit. There are some improvements over 2PC but those are add ons to
the basic 2PC, which is what this patch provides. Atomic visibility
and global consistency however have alternative solutions but all of
those solutions require 2PC to be supported. Each of those are large
pieces of work and trying to get everything in may not work. Once we
have basic 2PC in place, there will be a ground to experiment with
solutions for global consistency and atomic visibility. If we manage
to do it right, we could make it pluggable as well. So, I think we
should concentrate on supporting basic 2PC work now.

Very good summary, thank you.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#71Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Bruce Momjian (#70)
Re: Transactions involving multiple postgres foreign servers, take 2

I've attached the new version patch set. 0006 is a separate patch
which introduces 'prefer' mode to foreign_twophase_commit.

I hope we can use this feature. Thank you for making patches and
discussions.
I'm currently understanding the logic and found some minor points to be
fixed.

I'm sorry if my understanding is wrong.

* The v22 patches need rebase as they can't apply to the current master.

* FdwXactAtomicCommitParticipants said in
src/backend/access/fdwxact/README
is not implemented. Is FdwXactParticipants right?

* A following comment says that this code is for "One-phase",
but second argument of FdwXactParticipantEndTransaction() describes
this code is not "onephase".

AtEOXact_FdwXact() in fdwxact.c
/* One-phase rollback foreign transaction */
FdwXactParticipantEndTransaction(fdw_part, false, false);

static void
FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool
onephase,
bool for_commit)

* "two_phase_commit" option is mentioned in postgres-fdw.sgml,
but I can't find related code.

* resolver.c comments have the sentence
containing two blanks.(Emergency Termination)

* There are some inconsistency with PostgreSQL wiki.
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

I understand it's difficult to keep consistency, I think it's ok to
fix later
when these patches almost be able to be committed.

- I can't find "two_phase_commit" option in the source code.
But 2PC is work if the remote server's "max_prepared_transactions"
is set
to non zero value. It is correct work, isn't it?

- some parameters are renamed or added in latest patches.
max_prepared_foreign_transaction, max_prepared_transactions and so
on.

- typo: froeign_transaction_resolver_timeout

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION

#72Amit Kapila
amit.kapila16@gmail.com
In reply to: Tatsuo Ishii (#69)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 16, 2020 at 8:06 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

I think the problem mentioned above can occur with this as well or if
I am missing something then can you explain in further detail how it
won't create problem in the scenario I have used above?

So the problem you mentioned above is like this? (S1/S2 denotes
transactions (sessions), N1/N2 is the postgreSQL servers). Since S1
already committed on N1, S2 sees the row on N1. However S2 does not
see the row on N2 since S1 has not committed on N2 yet.

Yeah, something on these lines but S2 can execute the query on N1
directly which should fetch the data from both N1 and N2.

The algorythm assumes that any client should access database through a
middle ware. Such direct access is prohibited.

okay, so it seems we need few things which middleware (Pangea) expects
if we have to follow the design of paper.

Even if
there is a solution using REPEATABLE READ isolation level we might not
prefer to use that as the only level for distributed transactions, it
might be too costly but let us first see how does it solve the
problem?

The paper extends Snapshot Isolation (SI, which is same as our
REPEATABLE READ isolation level) to "Global Snapshot Isolation", GSI).
I think GSI will solve the problem (atomic visibility) we are
discussing.

Unlike READ COMMITTED, REPEATABLE READ acquires snapshot at the time
when the first command is executed in a transaction (READ COMMITTED
acquires a snapshot at each command in a transaction). Pangea controls
the timing of the snapshot acquisition on pair of transactions
(S1/N1,N2 or S2/N1,N2) so that each pair acquires the same
snapshot. To achieve this, while some transactions are trying to
acquire snapshot, any commit operation should be postponed. Likewise
any snapshot acquisition should wait until any in progress commit
operations are finished (see Algorithm I to III in the paper for more
details).

I haven't read the paper completely but it sounds quite restrictive
(like both commits and snapshots need to wait). Another point is that
do we want some middleware involved in the solution? The main thing
I was looking into at this stage is do we think that the current
implementation proposed by the patch for 2PC is generic enough that we
would be later able to integrate the solution for atomic visibility?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#73Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiro Ikeda (#71)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 17 Jun 2020 at 09:01, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

I've attached the new version patch set. 0006 is a separate patch
which introduces 'prefer' mode to foreign_twophase_commit.

I hope we can use this feature. Thank you for making patches and
discussions.
I'm currently understanding the logic and found some minor points to be
fixed.

I'm sorry if my understanding is wrong.

* The v22 patches need rebase as they can't apply to the current master.

* FdwXactAtomicCommitParticipants said in
src/backend/access/fdwxact/README
is not implemented. Is FdwXactParticipants right?

Right.

* A following comment says that this code is for "One-phase",
but second argument of FdwXactParticipantEndTransaction() describes
this code is not "onephase".

AtEOXact_FdwXact() in fdwxact.c
/* One-phase rollback foreign transaction */
FdwXactParticipantEndTransaction(fdw_part, false, false);

static void
FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool
onephase,
bool for_commit)

* "two_phase_commit" option is mentioned in postgres-fdw.sgml,
but I can't find related code.

* resolver.c comments have the sentence
containing two blanks.(Emergency Termination)

* There are some inconsistency with PostgreSQL wiki.
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

I understand it's difficult to keep consistency, I think it's ok to
fix later
when these patches almost be able to be committed.

- I can't find "two_phase_commit" option in the source code.
But 2PC is work if the remote server's "max_prepared_transactions"
is set
to non zero value. It is correct work, isn't it?

Yes. I had removed two_phase_commit option from postgres_fdw.
Currently, postgres_fdw uses 2pc when 2pc is required. Therefore,
max_prepared_transactions needs to be set to more than one, as you
mentioned.

- some parameters are renamed or added in latest patches.
max_prepared_foreign_transaction, max_prepared_transactions and so
on.

- typo: froeign_transaction_resolver_timeout

Thank you for your review! I've incorporated your comments on the
local branch. I'll share the latest version patch.

Also, I've updated the wiki page. I'll try to keep the wiki page up-to-date.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#74Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#67)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 16, 2020 at 6:43 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Jun 16, 2020 at 3:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Is there some mapping between GXID and XIDs allocated for each node or
will each node use the GXID as XID to modify the data? Are we fine
with parking the work for global snapshots and atomic visibility to a
separate patch and just proceed with the design proposed by this
patch?

Distributed transaction involves, atomic commit, atomic visibility
and global consistency. 2PC is the only practical solution for atomic
commit. There are some improvements over 2PC but those are add ons to
the basic 2PC, which is what this patch provides. Atomic visibility
and global consistency however have alternative solutions but all of
those solutions require 2PC to be supported. Each of those are large
pieces of work and trying to get everything in may not work. Once we
have basic 2PC in place, there will be a ground to experiment with
solutions for global consistency and atomic visibility. If we manage
to do it right, we could make it pluggable as well.

I think it is easier said than done. If you want to make it pluggable
or want alternative solutions to adapt the 2PC support provided by us
we should have some idea how those alternative solutions look like. I
am not telling we have to figure out each and every detail of those
solutions but without paying any attention to the high-level picture
we might end up doing something for 2PC here which either needs a lot
of modifications or might need a design change which would be bad.
Basically, if we later decide to use something like Global Xid to
achieve other features then what we are doing here might not work.

I think it is a good idea to complete the work in pieces where each
piece is useful on its own but without having clarity on the overall
solution that could be a recipe for disaster. It is possible that you
have some idea in your mind where you can see clearly how this piece
of work can fit in the bigger picture but it is not very apparent to
others or doesn't seem to be documented anywhere.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#75Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Amit Kapila (#72)
Re: Transactions involving multiple postgres foreign servers, take 2

okay, so it seems we need few things which middleware (Pangea) expects
if we have to follow the design of paper.

Yes.

I haven't read the paper completely but it sounds quite restrictive
(like both commits and snapshots need to wait).

Maybe. There is a performance evaluation in the paper. You might want
to take a look at it.

Another point is that
do we want some middleware involved in the solution? The main thing
I was looking into at this stage is do we think that the current
implementation proposed by the patch for 2PC is generic enough that we
would be later able to integrate the solution for atomic visibility?

My concern is, FDW+2PC without atomic visibility could lead to data
inconsistency among servers in some cases. If my understanding is
correct, FDW+2PC (without atomic visibility) cannot prevent data
inconsistency in the case below. Initially table t1 has only one row
with i = 0 on both N1 and N2. By executing S1 and S2 concurrently, t1
now has different value of i, 0 and 1.

S1/N1: DROP TABLE t1;
DROP TABLE
S1/N1: CREATE TABLE t1(i int);
CREATE TABLE
S1/N1: INSERT INTO t1 VALUES(0);
INSERT 0 1
S1/N2: DROP TABLE t1;
DROP TABLE
S1/N2: CREATE TABLE t1(i int);
CREATE TABLE
S1/N2: INSERT INTO t1 VALUES(0);
INSERT 0 1
S1/N1: BEGIN;
BEGIN
S1/N2: BEGIN;
BEGIN
S1/N1: UPDATE t1 SET i = i + 1; -- i = 1
UPDATE 1
S1/N2: UPDATE t1 SET i = i + 1; -- i = 1
UPDATE 1
S1/N1: PREPARE TRANSACTION 's1n1';
PREPARE TRANSACTION
S1/N1: COMMIT PREPARED 's1n1';
COMMIT PREPARED
S2/N1: BEGIN;
BEGIN
S2/N2: BEGIN;
BEGIN
S2/N2: DELETE FROM t1 WHERE i = 1;
DELETE 0
S2/N1: DELETE FROM t1 WHERE i = 1;
DELETE 1
S1/N2: PREPARE TRANSACTION 's1n2';
PREPARE TRANSACTION
S2/N1: PREPARE TRANSACTION 's2n1';
PREPARE TRANSACTION
S2/N2: PREPARE TRANSACTION 's2n2';
PREPARE TRANSACTION
S1/N2: COMMIT PREPARED 's1n2';
COMMIT PREPARED
S2/N1: COMMIT PREPARED 's2n1';
COMMIT PREPARED
S2/N2: COMMIT PREPARED 's2n2';
COMMIT PREPARED
S2/N1: SELECT * FROM t1;
i
---
(0 rows)

S2/N2: SELECT * FROM t1;
i
---
1
(1 row)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#76Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Tatsuo Ishii (#75)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 18 Jun 2020 at 08:31, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

okay, so it seems we need few things which middleware (Pangea) expects
if we have to follow the design of paper.

Yes.

I haven't read the paper completely but it sounds quite restrictive
(like both commits and snapshots need to wait).

Maybe. There is a performance evaluation in the paper. You might want
to take a look at it.

Another point is that
do we want some middleware involved in the solution? The main thing
I was looking into at this stage is do we think that the current
implementation proposed by the patch for 2PC is generic enough that we
would be later able to integrate the solution for atomic visibility?

My concern is, FDW+2PC without atomic visibility could lead to data
inconsistency among servers in some cases. If my understanding is
correct, FDW+2PC (without atomic visibility) cannot prevent data
inconsistency in the case below. Initially table t1 has only one row
with i = 0 on both N1 and N2. By executing S1 and S2 concurrently, t1
now has different value of i, 0 and 1.

IIUC the following sequence won't happen because COMMIT PREPARED
's1n1' cannot be executed before PREPARE TRANSACTION 's1n2'. But as
you mentioned, we cannot prevent data inconsistency even with FDW+2PC
e.g., when S2 starts a transaction between COMMIT PREPARED on N1 and
COMMIT PREPARED on N2 by S1. The point is this data inconsistency is
lead by an inconsistent read but not by an inconsistent commit
results. I think there are kinds of possibilities causing data
inconsistency but atomic commit and atomic visibility eliminate
different possibilities. We can eliminate all possibilities of data
inconsistency only after we support 2PC and globally MVCC.

S1/N1: DROP TABLE t1;
DROP TABLE
S1/N1: CREATE TABLE t1(i int);
CREATE TABLE
S1/N1: INSERT INTO t1 VALUES(0);
INSERT 0 1
S1/N2: DROP TABLE t1;
DROP TABLE
S1/N2: CREATE TABLE t1(i int);
CREATE TABLE
S1/N2: INSERT INTO t1 VALUES(0);
INSERT 0 1
S1/N1: BEGIN;
BEGIN
S1/N2: BEGIN;
BEGIN
S1/N1: UPDATE t1 SET i = i + 1; -- i = 1
UPDATE 1
S1/N2: UPDATE t1 SET i = i + 1; -- i = 1
UPDATE 1
S1/N1: PREPARE TRANSACTION 's1n1';
PREPARE TRANSACTION
S1/N1: COMMIT PREPARED 's1n1';
COMMIT PREPARED
S2/N1: BEGIN;
BEGIN
S2/N2: BEGIN;
BEGIN
S2/N2: DELETE FROM t1 WHERE i = 1;
DELETE 0
S2/N1: DELETE FROM t1 WHERE i = 1;
DELETE 1
S1/N2: PREPARE TRANSACTION 's1n2';
PREPARE TRANSACTION
S2/N1: PREPARE TRANSACTION 's2n1';
PREPARE TRANSACTION
S2/N2: PREPARE TRANSACTION 's2n2';
PREPARE TRANSACTION
S1/N2: COMMIT PREPARED 's1n2';
COMMIT PREPARED
S2/N1: COMMIT PREPARED 's2n1';
COMMIT PREPARED
S2/N2: COMMIT PREPARED 's2n2';
COMMIT PREPARED
S2/N1: SELECT * FROM t1;
i
---
(0 rows)

S2/N2: SELECT * FROM t1;
i
---
1
(1 row)

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#77Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Masahiko Sawada (#76)
Re: Transactions involving multiple postgres foreign servers, take 2

My concern is, FDW+2PC without atomic visibility could lead to data
inconsistency among servers in some cases. If my understanding is
correct, FDW+2PC (without atomic visibility) cannot prevent data
inconsistency in the case below. Initially table t1 has only one row
with i = 0 on both N1 and N2. By executing S1 and S2 concurrently, t1
now has different value of i, 0 and 1.

IIUC the following sequence won't happen because COMMIT PREPARED
's1n1' cannot be executed before PREPARE TRANSACTION 's1n2'.

You are right.

But as
you mentioned, we cannot prevent data inconsistency even with FDW+2PC
e.g., when S2 starts a transaction between COMMIT PREPARED on N1 and
COMMIT PREPARED on N2 by S1.

Ok, example updated.

S1/N1: DROP TABLE t1;
DROP TABLE
S1/N1: CREATE TABLE t1(i int);
CREATE TABLE
S1/N1: INSERT INTO t1 VALUES(0);
INSERT 0 1
S1/N2: DROP TABLE t1;
DROP TABLE
S1/N2: CREATE TABLE t1(i int);
CREATE TABLE
S1/N2: INSERT INTO t1 VALUES(0);
INSERT 0 1
S1/N1: BEGIN;
BEGIN
S1/N2: BEGIN;
BEGIN
S1/N1: UPDATE t1 SET i = i + 1; -- i = 1
UPDATE 1
S1/N2: UPDATE t1 SET i = i + 1; -- i = 1
UPDATE 1
S2/N1: BEGIN;
BEGIN
S2/N2: BEGIN;
BEGIN
S1/N1: PREPARE TRANSACTION 's1n1';
PREPARE TRANSACTION
S1/N2: PREPARE TRANSACTION 's1n2';
PREPARE TRANSACTION
S2/N1: PREPARE TRANSACTION 's2n1';
PREPARE TRANSACTION
S2/N2: PREPARE TRANSACTION 's2n2';
PREPARE TRANSACTION
S1/N1: COMMIT PREPARED 's1n1';
COMMIT PREPARED
S2/N1: DELETE FROM t1 WHERE i = 1;
DELETE 1
S2/N2: DELETE FROM t1 WHERE i = 1;
DELETE 0
S1/N2: COMMIT PREPARED 's1n2';
COMMIT PREPARED
S2/N1: COMMIT PREPARED 's2n1';
COMMIT PREPARED
S2/N2: COMMIT PREPARED 's2n2';
COMMIT PREPARED
S2/N1: SELECT * FROM t1;
i
---
(0 rows)

S2/N2: SELECT * FROM t1;
i
---
1
(1 row)

The point is this data inconsistency is
lead by an inconsistent read but not by an inconsistent commit
results. I think there are kinds of possibilities causing data
inconsistency but atomic commit and atomic visibility eliminate
different possibilities. We can eliminate all possibilities of data
inconsistency only after we support 2PC and globally MVCC.

IMO any permanent data inconsistency is a serious problem for users no
matter what the technical reasons are.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#78Amit Kapila
amit.kapila16@gmail.com
In reply to: Tatsuo Ishii (#75)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jun 18, 2020 at 5:01 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Another point is that
do we want some middleware involved in the solution? The main thing
I was looking into at this stage is do we think that the current
implementation proposed by the patch for 2PC is generic enough that we
would be later able to integrate the solution for atomic visibility?

My concern is, FDW+2PC without atomic visibility could lead to data
inconsistency among servers in some cases. If my understanding is
correct, FDW+2PC (without atomic visibility) cannot prevent data
inconsistency in the case below.

You are right and we are not going to claim that after this feature is
committed. This feature has independent use cases like it can allow
parallel copy when foreign tables are involved once we have parallel
copy and surely there will be more. I think it is clear that we need
atomic visibility (some way to ensure global consistency) to avoid the
data inconsistency problems you and I are worried about and we can do
that as a separate patch but at this stage, it would be good if we can
have some high-level design of that as well so that if we need some
adjustments in the design/implementation of this patch then we can do
it now. I think there is some discussion on the other threads (like
[1]: /messages/by-id/21BC916B-80A1-43BF-8650-3363CCDAE09C@postgrespro.ru
follow up on to study the impact.

Having said that, I don't think that is a reason to stop reviewing or
working on this patch.

[1]: /messages/by-id/21BC916B-80A1-43BF-8650-3363CCDAE09C@postgrespro.ru

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#79Bruce Momjian
bruce@momjian.us
In reply to: Amit Kapila (#78)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jun 18, 2020 at 04:09:56PM +0530, Amit Kapila wrote:

You are right and we are not going to claim that after this feature is
committed. This feature has independent use cases like it can allow
parallel copy when foreign tables are involved once we have parallel
copy and surely there will be more. I think it is clear that we need
atomic visibility (some way to ensure global consistency) to avoid the
data inconsistency problems you and I are worried about and we can do
that as a separate patch but at this stage, it would be good if we can
have some high-level design of that as well so that if we need some
adjustments in the design/implementation of this patch then we can do
it now. I think there is some discussion on the other threads (like
[1]) about the kind of stuff we are worried about which I need to
follow up on to study the impact.

Having said that, I don't think that is a reason to stop reviewing or
working on this patch.

I think our first step is to allow sharding to work on read-only
databases, e.g. data warehousing. Read/write will require global
snapshots. It is true that 2PC is limited usefulness without global
snapshots, because, by definition, systems using 2PC are read-write
systems. However, I can see cases where you are loading data into a
data warehouse but want 2PC so the systems remain consistent even if
there is a crash during loading.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#80Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Bruce Momjian (#79)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jun 18, 2020 at 6:49 PM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Jun 18, 2020 at 04:09:56PM +0530, Amit Kapila wrote:

You are right and we are not going to claim that after this feature is
committed. This feature has independent use cases like it can allow
parallel copy when foreign tables are involved once we have parallel
copy and surely there will be more. I think it is clear that we need
atomic visibility (some way to ensure global consistency) to avoid the
data inconsistency problems you and I are worried about and we can do
that as a separate patch but at this stage, it would be good if we can
have some high-level design of that as well so that if we need some
adjustments in the design/implementation of this patch then we can do
it now. I think there is some discussion on the other threads (like
[1]) about the kind of stuff we are worried about which I need to
follow up on to study the impact.

Having said that, I don't think that is a reason to stop reviewing or
working on this patch.

I think our first step is to allow sharding to work on read-only
databases, e.g. data warehousing. Read/write will require global
snapshots. It is true that 2PC is limited usefulness without global
snapshots, because, by definition, systems using 2PC are read-write
systems. However, I can see cases where you are loading data into a
data warehouse but want 2PC so the systems remain consistent even if
there is a crash during loading.

For sharding, just implementing 2PC without global consistency
provides limited functionality. But for general purpose federated
databases 2PC serves an important functionality - atomic visibility.
When PostgreSQL is used as one of the coordinators in a heterogeneous
federated database system, it's not expected to have global
consistency or even atomic visibility. But it needs a guarantee that
once a transaction commit, all its legs are committed. 2PC provides
that guarantee as long as the other databases keep their promise that
prepared transactions will always get committed when requested so.
Subtle to this is HA requirement from these databases as well. So the
functionality provided by this patch is important outside the sharding
case as well.

As you said, even for a data warehousing application, there is some
write in the form of loading/merging data. If that write happens
across multiple servers, we need atomic commit to be guaranteed. Some
of these applications can work even if global consistency and atomic
visibility is guaranteed eventually.

--
Best Wishes,
Ashutosh Bapat

#81Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiko Sawada (#73)
7 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 17 Jun 2020 at 14:07, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 17 Jun 2020 at 09:01, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

I've attached the new version patch set. 0006 is a separate patch
which introduces 'prefer' mode to foreign_twophase_commit.

I hope we can use this feature. Thank you for making patches and
discussions.
I'm currently understanding the logic and found some minor points to be
fixed.

I'm sorry if my understanding is wrong.

* The v22 patches need rebase as they can't apply to the current master.

* FdwXactAtomicCommitParticipants said in
src/backend/access/fdwxact/README
is not implemented. Is FdwXactParticipants right?

Right.

* A following comment says that this code is for "One-phase",
but second argument of FdwXactParticipantEndTransaction() describes
this code is not "onephase".

AtEOXact_FdwXact() in fdwxact.c
/* One-phase rollback foreign transaction */
FdwXactParticipantEndTransaction(fdw_part, false, false);

static void
FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool
onephase,
bool for_commit)

* "two_phase_commit" option is mentioned in postgres-fdw.sgml,
but I can't find related code.

* resolver.c comments have the sentence
containing two blanks.(Emergency Termination)

* There are some inconsistency with PostgreSQL wiki.
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

I understand it's difficult to keep consistency, I think it's ok to
fix later
when these patches almost be able to be committed.

- I can't find "two_phase_commit" option in the source code.
But 2PC is work if the remote server's "max_prepared_transactions"
is set
to non zero value. It is correct work, isn't it?

Yes. I had removed two_phase_commit option from postgres_fdw.
Currently, postgres_fdw uses 2pc when 2pc is required. Therefore,
max_prepared_transactions needs to be set to more than one, as you
mentioned.

- some parameters are renamed or added in latest patches.
max_prepared_foreign_transaction, max_prepared_transactions and so
on.

- typo: froeign_transaction_resolver_timeout

Thank you for your review! I've incorporated your comments on the
local branch. I'll share the latest version patch.

Also, I've updated the wiki page. I'll try to keep the wiki page up-to-date.

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Please review it.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v23-0007-Add-prefer-mode-to-foreign_twophase_commit.patchapplication/octet-stream; name=v23-0007-Add-prefer-mode-to-foreign_twophase_commit.patchDownload
From 9e182256a4c4e7485453ba83fd9276ac688a5f4d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 3 Jun 2020 16:38:13 +0900
Subject: [PATCH v23 7/7] Add prefer mode to foreign_twophase_commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/config.sgml                      | 29 +++++------
 doc/src/sgml/distributed-transaction.sgml     | 10 ++--
 src/backend/access/fdwxact/fdwxact.c          | 49 ++++++++++++++++---
 src/backend/utils/misc/guc.c                  |  5 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/include/access/fdwxact.h                  |  1 +
 .../test_fdwxact/expected/test_fdwxact.out    | 25 ++++++++++
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 28 +++++++++++
 8 files changed, 119 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d81ad08c3..29db09ca46 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9087,18 +9087,19 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
         <para>
          Specifies whether distributed transaction commits ensures that all
          involved changes on foreign servers are committed or not. Valid
-         values are <literal>required</literal> and <literal>disabled</literal>.
-         The default setting is <literal>disabled</literal>. Setting to
-         <literal>disabled</literal> don't use two-phase commit protocol to
-         commit or rollback distributed transactions. When set to
-         <literal>required</literal> distributed transactions strictly requires
-         that all written servers can use two-phase commit protocol.  That is,
-         the distributed transaction cannot commit if even one server does not
-         support the prepare callback routine
+         values are <literal>required</literal>, <literal>prefer</literal> and
+         <literal>disabled</literal>. The default setting is
+         <literal>disabled</literal>. Setting to <literal>disabled</literal>
+         don't use two-phase commit protocol to commit or rollback distributed
+         transactions. When set to <literal>required</literal> distributed
+         transactions strictly requires that all written servers can use
+         two-phase commit protocol.  That is, the distributed transaction cannot
+         commit if even one server does not support the prepare callback routine
          (described in <xref linkend="fdw-callbacks-transaction-management"/>).
-         In <literal>required</literal> case, distributed transaction commit will
-         wait for all involving foreign transaction to be committed before the
-         command return a "success" indication to the client.
+         In <literal>prefer</literal> and <literal>required</literal> case,
+         distributed transaction commit will wait for all involving foreign
+         transaction to be committed before the command return a "success"
+         indication to the client.
         </para>
 
         <para>
@@ -9108,9 +9109,9 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 
         <note>
          <para>
-          When <literal>disabled</literal> there can be risk of database
-          consistency if one or more foreign servers crashes while committing
-          the distributed transactions.
+          When <literal>disabled</literal> or <literal>prefer</literal>  there
+          can be risk of database consistency if one or more foreign servers
+          crashes while committing the distributed transactions.
          </para>
         </note>
        </listitem>
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
index b4b1e26a55..845b9508be 100644
--- a/doc/src/sgml/distributed-transaction.sgml
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -48,11 +48,11 @@
        prepares all transaction on the foreign servers if two-phase commit is
        required. Two-phase commit is required when the transaction modifies
        data on two or more servers including the local server itself and
-       <xref linkend="guc-foreign-twophase-commit"/> is
-       <literal>required</literal>. If the prepare on all foreign servers is
-       successful then go to the next step.  If there is any failure in the
-       prepare phase, the server will rollback all the transactions on both
-       local and foreign servers.
+       <xref linkend="guc-foreign-twophase-commit"/> is either
+       <literal>required</literal> or <literal>prefer</literal>. If the prepare
+       on all foreign servers is successful then go to the next step.  If
+       there is any failure in the prepare phase, the server will rollback
+       all the transactions on both local and foreign servers.
       </para>
      </listitem>
      <listitem>
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 76b973b473..9e5858bb12 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -438,7 +438,9 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
  * When foreign twophase commit is enabled, the behavior depends on the value
  * of foreign_twophase_commit; when 'required' we strictly require for all
  * foreign servers' FDW to support two-phase commit protocol and ask them to
- * prepare foreign transactions, and when 'disabled' we ask all foreign servers
+ * prepare foreign transactions, when 'prefer' we ask only foreign servers
+ * that are capable of two-phase commit to prepare foreign transactions and ask
+ * for other servers to commit, and when 'disabled' we ask all foreign servers
  * to commit foreign transaction in one-phase. If we failed to commit any of
  * them we change to aborting.
  *
@@ -506,8 +508,9 @@ checkForeignTwophaseCommitRequired(void)
 {
 	ListCell   *lc;
 	bool		need_twophase_commit;
-	bool		have_notwophase = false;
+	bool		have_notwophase;
 	int			nserverswritten = 0;
+	int			nserverstwophase = 0;
 
 	if (!IsForeignTwophaseCommitRequested())
 		return false;
@@ -519,22 +522,51 @@ checkForeignTwophaseCommitRequired(void)
 		if (!fdw_part->modified)
 			continue;
 
-		if (!SeverSupportTwophaseCommit(fdw_part))
-			have_notwophase = true;
+		if (SeverSupportTwophaseCommit(fdw_part))
+			nserverstwophase++;
 
 		nserverswritten++;
 	}
+	Assert(nserverswritten >= nserverstwophase);
+
+	/* check if there is any servers that don't support two-phase commit */
+	have_notwophase = (nserverswritten != nserverstwophase);
 
 	/* Did we modify the local non-temporary data? */
 	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+	{
 		nserverswritten++;
 
+		/*
+		 * We increment nserverstwophase as well for making code simple,
+		 * although we don't actually use two-phase commit for the local
+		 * transaction.
+		 */
+		nserverstwophase++;
+	}
+
 	if (nserverswritten <= 1)
 		return false;
 
-	/* We require for all modified server to support two-phase commit */
-	need_twophase_commit = (nserverswritten >= 2);
-	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+	if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
+	{
+		/*
+		 * In 'required' case, we require for all modified server to support
+		 * two-phase commit.
+		 */
+		need_twophase_commit = (nserverswritten >= 2);
+	}
+	else
+	{
+		Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER);
+
+		/*
+		 * In 'prefer' case, we use two-phase commit when this transaction modified
+		 * two or more servers including the local server or servers that support
+		 * two-phase commit.
+		 */
+		need_twophase_commit = (nserverstwophase >= 2);
+	}
 
 	/*
 	 * If foreign two phase commit is required then all foreign serves must be
@@ -555,7 +587,8 @@ checkForeignTwophaseCommitRequired(void)
 					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
 					 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
 
-		if (have_notwophase)
+		if (have_notwophase &&
+			foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 72fe0a7167..fddb172a96 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -428,11 +428,12 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 };
 
 /*
- * Although only "required" and "disabled" are documented, we accept all
- * the likely variants of "on" and "off".
+ * Although only "required", "prefer", and "disabled" are documented,
+ *  we accept all the likely variants of "on" and "off".
  */
 static const struct config_enum_entry foreign_twophase_commit_options[] = {
 	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false},
 	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
 	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
 	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5ed8617787..7f76e2dfcc 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -358,7 +358,7 @@
 							# foreign transactions
 							# after a failed attempt
 #foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
-					# disabled or required
+					# disabled, prefer or required
 
 #------------------------------------------------------------------------------
 # QUERY TUNING
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index d550ee9b87..965dbfc57f 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -37,6 +37,7 @@
 typedef enum
 {
 	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_PREFER, /* use twophase commit where available */
 	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
 										 * twophase commit */
 }			ForeignTwophaseCommitLevel;
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
index c6a91ac9f1..ce8465b52c 100644
--- a/src/test/modules/test_fdwxact/expected/test_fdwxact.out
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -221,3 +221,28 @@ BEGIN;
 INSERT INTO ft_1 VALUES (1);
 PREPARE TRANSACTION 'global_x1';
 ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+-- Test 'prefer' mode.
+-- The cases where failed in 'required' mode should pass in 'prefer' mode.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
index 8cf860e295..72a9ee6be4 100644
--- a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -191,3 +191,31 @@ PREPARE TRANSACTION 'global_x1';
 BEGIN;
 INSERT INTO ft_1 VALUES (1);
 PREPARE TRANSACTION 'global_x1';
+
+
+-- Test 'prefer' mode.
+-- The cases where failed in 'required' mode should pass in 'prefer' mode.
+-- We simply commit/rollback a transaction in one-phase on a server
+-- that doesn't support two-phase commit, instead of error.
+SET foreign_twophase_commit TO 'prefer';
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
-- 
2.23.0

v23-0006-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v23-0006-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 3218679574539d53e51c0c983507a57eb0c66898 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v23 6/7] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 ++
 .../test_fdwxact/expected/test_fdwxact.out    | 223 +++++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 193 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 137 +++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 471 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 +++++++
 src/test/regress/pg_regress.c                 |  13 +-
 13 files changed, 1297 insertions(+), 5 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 29de73c060..8a48e6ba19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -13,6 +13,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..c6a91ac9f1
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,223 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..8cf860e295
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,193 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..8d48a74e86
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,137 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the case where transaction attempting prepare the local transaction fails after
+# preparing foreign transactions. The first attempt should be succeeded, but the second
+# attempt will fail after preparing foreign transaction, and should rollback the prepared
+# foreign transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
+
+# Inject an panic into prepare phase on srv_2pc_2. The server crashes after preparing both
+# foreign transaction. After the restart, those transactions are recovered as in-doubt
+# transactions. We check if the resolver process rollbacks those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'prepare', 'srv_2pc_2');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
+
+# Inject an panic into commit phase on srv_2pc_1. The server crashes due to the panic
+# error raised by resolver process during commit prepared foreign transaction on srv_2pc_1.
+# After the restart, those transactions are recovered as in-doubt transactions. We check if
+# the resolver process commits those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'commit', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..738690c978
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,471 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f11a3b9e26..f7d11d9bea 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2338,9 +2338,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2355,7 +2358,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v23-0005-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v23-0005-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From ea2818eacb8065f90f52dfbaa55370f0bd92ede6 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:28:58 +0500
Subject: [PATCH v23 5/7] postgres_fdw supports atomic commit APIs.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 588 +++++++++++-------
 .../postgres_fdw/expected/postgres_fdw.out    | 280 ++++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  21 +-
 contrib/postgres_fdw/postgres_fdw.h           |   8 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 124 +++-
 doc/src/sgml/postgres-fdw.sgml                |  10 +-
 8 files changed, 785 insertions(+), 256 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 52d1fe3563..25280cbd94 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +71,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -92,6 +90,12 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(ForeignServer *server, UserMapping *userg,
+										  bool will_prep_stmt, bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -104,11 +108,29 @@ static bool UserMappingPasswordRequired(UserMapping *user);
  * (not even on error), we need this flag to cue manual cleanup.
  */
 PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+GetConnection(UserMapping *user, bool will_prep_stmt, bool start_transaction)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+
+	entry = GetConnectionState(GetForeignServer(user->serverid),
+							   user, will_prep_stmt, start_transaction);
+
+	return entry->conn;
+}
+
+/*
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
+ */
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	ConnCacheEntry *entry;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +150,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +157,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +170,22 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(ForeignServer *server, UserMapping *user,
+				   bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,14 +213,13 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
-		ForeignServer *server = GetForeignServer(user->serverid);
-
 		/* Reset all transient state fields, to be sure all are clean */
 		entry->xact_depth = 0;
 		entry->have_prep_stmt = false;
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +230,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,12 +246,18 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
-	return entry->conn;
+	return entry;
 }
 
 /*
@@ -473,7 +518,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -700,193 +745,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -903,10 +761,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -917,6 +771,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1251,3 +1109,309 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(frstate->server, frstate->usermapping, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recursively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(frstate->server, frstate->usermapping, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..dbdd4cc32c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8923,10 +8944,10 @@ RESET ROLE;
 ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD password_required 'false');
 SET ROLE regress_nosuper;
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
+ c1 | c2 |        c3         |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------------------+------------------------------+--------------------------+----+------------+-----
+  1 |  2 | 00001_trig_update | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1          | foo
 (1 row)
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
@@ -8943,16 +8964,16 @@ HINT:  User mappings with the sslcert or sslkey options set may only be created
 DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapping.
 RESET ROLE;
 -- The user mapping for public is passwordless and lacks the password_required=false
 -- mapping option, but will work because the current user is a superuser.
 SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+ c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo
 (1 row)
 
 -- cleanup
@@ -8961,16 +8982,225 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" of relation "T 5" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..bf21fbd8ba 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user, false, true);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user, true, true);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..5445569301 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,7 +130,8 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
+							 bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
@@ -137,6 +139,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +208,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..1ef66123df 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2598,7 +2621,7 @@ ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD passwor
 SET ROLE regress_nosuper;
 
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
 -- first set password_required so we see the right error messages
@@ -2612,7 +2635,7 @@ DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 RESET ROLE;
 
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index eab2cc9378..8783f2077c 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -521,9 +521,13 @@ OPTIONS (ADD password_required 'false');
   </para>
 
   <para>
-   Note that it is currently not supported by
-   <filename>postgres_fdw</filename> to prepare the remote transaction for
-   two-phase commit.
+   <filename>postgrs_fdw</filename> support to prepare the remote transaction
+   for two-phase commit.  Also, if two-phase commit protocol is required to
+   commit the distributed transaction, <filename>postgres_fdw</filename> commits
+   the remote transaction using two-phase commit protocol
+   (see <xref linkend="atomic-commit"/>).  So the remote server needs to set
+   set <xref linkend="guc-max-prepared-transactions"/> more than one so that
+   it can prepare the remote transaction.
   </para>
  </sect2>
 
-- 
2.23.0

v23-0004-Documentation-update.patchapplication/octet-stream; name=v23-0004-Documentation-update.patchDownload
From a8b02cce5a71624e72ab1b89a7d220b9663bc9c1 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v23 4/7] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 +++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 152 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 238 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  91 +++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 810 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 700271fd40..3a72ae3870 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9223,6 +9223,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -10934,6 +10939,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-doubt status.
+       A foreign transaction can have this status when the user has cancelled
+       the statement or the server crashes during transaction commit.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 783bf7a12b..5d81ad08c3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9070,6 +9070,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..b4b1e26a55
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,152 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally. The server commits transaction locally.  Any failure happens
+       in this step the server changes to rollback, then rollback all transactions
+       on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    Each commit of a distributed transaction will wait until confirmation is
+    received that all prepared transactions are committed or rolled back.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, foreign transactions
+    become <firstterm>in-doubt</firstterm> in two cases:
+
+    <itemizedlist>
+     <listitem>
+      <para>The local node crashed during either preparing or resolving foreign
+       transaction.</para>
+     </listitem>
+     <listitem>
+      <para>user canceled the query.</para>
+     </listitem>
+    </itemizedlist>
+
+    You can check in-doubt transaction in <xref linkend="view-pg-foreign-xacts"/>
+    view. These foreign transactions are resolved by foreign transaction resolver
+    process or executing <function>pg_resolve_foriegn_xact</function> function
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving both foreign transactions that are prepared by
+    online transactions and in-doubt transactions. They commit or rollback
+    prepared transactions on all foreign servers involved with the distributed
+    transaction if the local node received agreement messages from all
+    foreign servers during the first step of two-phase commit protocol.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..3589f8a66b 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1415,6 +1415,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1894,4 +2005,131 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used to manage Transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <literal>CommitForeignTransaction</literal>
+     and <literal>RollbackForeignTransaction</literal> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <literal>CommitForeignTransaction</literal> function
+     in the pre-commit phase and calls
+     <literal>RollbackForeignTransaction</literal> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..65fd76f174 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9d71678029..2102298e38 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26142,6 +26142,97 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 89662cc0a3..ff4625cf15 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1052,6 +1052,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1273,6 +1285,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1550,6 +1574,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1861,6 +1890,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index c41ce9499b..5ef1f4a329 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -170,6 +170,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v23-0003-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v23-0003-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From 2512c6f3d21720e6b67c6251aae99d04d5d80f40 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:16:02 +0900
Subject: [PATCH v23 3/7] Support atomic commit among multiple foreign servers.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  109 +
 src/backend/access/fdwxact/fdwxact.c          | 2754 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  558 ++++
 src/backend/access/fdwxact/resolver.c         |  443 +++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   66 +
 src/backend/access/transam/xact.c             |   28 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/copy.c                   |    6 +
 src/backend/commands/foreigncmds.c            |   30 +
 src/backend/executor/execPartition.c          |    8 +
 src/backend/executor/nodeForeignscan.c        |   24 +
 src/backend/executor/nodeModifyTable.c        |    6 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   79 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  164 +
 src/include/access/fdwxact_launcher.h         |   28 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   63 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   22 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    6 +
 src/include/storage/proc.h                    |   11 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |    7 +
 55 files changed, 4824 insertions(+), 17 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..3cfa06d32f
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,109 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactParticipants, which is maintained by PostgreSQL's the
+global transaction manager (GTM), as a distributed transaction participant.
+The registered foreign transactions are tracked until the end of transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+We record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE each foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for PREPARE, COMMIT or ROLLBACK
+the transaction on the foreign server, respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a
+foreign transaction, because there is a race condition that the coordinator
+could crash in time between the resolution is completed and writing the WAL
+removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions will have an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before the foreign
+transaction is committed and aborted by FDW callback functions respectively.
+FdwXact entry is removed once the foreign transaction is resolved with WAL
+logging.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*1). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                     PREPARING                      |----+
+ +----------------------------------------------------+    |
+                          |                                |
+                          v                                |
+ +----------------------------------------------------+    |
+ |                    PREPARED(*1)                    |    | (*2)
+ +----------------------------------------------------+    |
+           |                               |               |
+           v                               v               |
+ +--------------------+          +--------------------+    |
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |<---+
+ +--------------------+          +--------------------+
+
+(*1) Recovered FdwXact entries starts with PREPARED
+(*2) Paths when an error occurrs during preparing
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..76b973b473
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2754 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * Two-phase commit protocol is used when the transaction modified two or
+ * more servers including the local node.  If two-phase commit protocol
+ * is not required all foreign transactions are committed at pre-commit
+ * phase.
+ *
+ * During executor node initialization, they can register the foreign server
+ * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId()
+ * to participate it to a group for global commit.  The foreign servers are
+ * registered if FDW has both CommitForeignTransaction API and
+ * RollbackForeignTransaction API.  Registered participant servers are
+ * identified by OIDs of foreign server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * all foreign servers.  And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask to commit, we also tell to notify us when
+ * it's done, so that we can wait interruptibly to finish, and so that
+ * we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request.  We have one queue
+ * of which elements are ordered by the timestamp when they expect to be
+ * processed.  Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp when they expects to be processed.
+ * On failure, it enqueues again with new timestamp (last timestamp +
+ * foreign_xact_resolution_interval).
+ *
+ * If server crash occurs or user canceled waiting the prepared foreign
+ * transactions are left without a holder.  Such foreign transactions are
+ * resolved automatically by the resolver process.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.  To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *   status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set locking_backend
+ *   of the FdwXact entry to lock the entry, which prevents the entry from
+ *   being updated and removed by concurrent processes.
+ * * FdwXact entries whose local transaction is either being processed
+ *   (fdwxact->owner is not NULL) or prepared (TwoPhaseExists() is true) can be
+ *   processed by neither pg_resolve_foreign_xact(), pg_remove_foreign_xact() nor
+ *   automatic resolution.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk.  FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon.  We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallack(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define SeverSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.  This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants may not support transaction callbacks: commit, rollback and
+ * prepare.  If a member of participants doesn't support any transaction
+ * callbacks, i.g. ServerSupportTransactionCallack() returns false,
+ * we don't end its transaction.
+ *
+ * FdwXactParticipants_tmp is used to update FdwXactParticipants atomically
+ * when executing COMMIT/ROLLBACK PREPARED command.  In COMMIT PREPARED case,
+ * we don't want to rollback foreign transactions even if an error occurs,
+ * because the local prepared transaction never turn over rollback in that
+ * case.  However, preparing FdwXactParticipants might be lead an error
+ * because of calling palloc() inside.  So we prepare FdwXactParticipants in
+ * two phase.  In the first phase, PrepareFdwXactParticipants(), we collect
+ * all foreign transactions associated with the local prepared transactions
+ * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
+ * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
+ * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
+ *
+ * FdwXactLocalXid is the local transaction id associated with FdwXactParticipants.
+ */
+static List *FdwXactParticipants = NIL;
+static List *FdwXactParticipants_tmp = NIL;
+static TransactionId FdwXactLocalXid = InvalidTransactionId;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static void register_fdwxact(Oid serverid, Oid userid, bool modified);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit);
+static bool checkForeignTwophaseCommitRequired(void);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static void FdwXactPrepareForeignTransactions(bool prepare_all);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static TransactionId FdwXactDetermineTransactionFate(TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and
+ * RegisterFdwXactByServerId are called by executor during initialization.
+ */
+void
+RegisterFdwXactByRelId(Oid relid, bool modified)
+{
+	Relation	rel;
+	Oid			serverid;
+	Oid			userid;
+
+	rel = relation_open(relid, NoLock);
+	serverid = GetForeignServerIdByRelId(relid);
+	userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId();
+	relation_close(rel, NoLock);
+
+	register_fdwxact(serverid, userid, modified);
+}
+
+void
+RegisterFdwXactByServerId(Oid serverid, bool modified)
+{
+	register_fdwxact(serverid, GetUserId(), modified);
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction. The foreign transaction identified
+ * by given server id and user id.
+ */
+static void
+register_fdwxact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Participant's information is also needed at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	pfree(routine);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	return fdw_part;
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Set the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired())
+	{
+		/*
+		 * Prepare foreign transactions on foreign servers that support two-phase
+		 * commit.  Note that we keep FdwXactParticipants until the end of the
+		 * transaction.
+		 */
+		FdwXactPrepareForeignTransactions(false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+	else
+	{
+		/*
+		 * Commit other foreign transactions and delete the participant entry from
+		 * the list.
+		 */
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			Assert(!fdw_part->fdwxact);
+
+			/* Commit the foreign transaction in one-phase */
+			if (ServerSupportTransactionCallack(fdw_part))
+				FdwXactParticipantEndTransaction(fdw_part, true);
+		}
+
+		/*
+		 * If we don't need two-phase commit, all participants' transactions should
+		 * be completed at this time.
+		 */
+		list_free(FdwXactParticipants);
+		FdwXactParticipants = NIL;
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(void)
+{
+	ListCell   *lc;
+	bool		need_twophase_commit;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+		nserverswritten++;
+
+	if (nserverswritten <= 1)
+		return false;
+
+	/* We require for all modified server to support two-phase commit */
+	need_twophase_commit = (nserverswritten >= 2);
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/*
+	 * If foreign two phase commit is required then all foreign serves must be
+	 * capable of doing two-phase commit
+	 */
+	if (need_twophase_commit)
+	{
+		/* Parameter check */
+		if (max_prepared_foreign_xacts == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+		if (max_foreign_xact_resolvers == 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+					 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+		if (have_notwophase)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+					 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+	}
+
+	return need_twophase_commit;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state.xid = FdwXactLocalXid;
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW. If prepare_all is false, we prepare only modified
+ * foreign transactions.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(bool prepare_all)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			continue;
+
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, FdwXactLocalXid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(FdwXactLocalXid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.xid = FdwXactLocalXid;
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->owner = MyProc;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->owner = NULL;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Set the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All involved
+	 * servers need to support two-phase commit as we prepare on them regardless of
+	 * modified or not.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/* Prepare transactions on participating foreign servers. */
+	FdwXactPrepareForeignTransactions(true);
+
+	/*
+	 * We keep prepared foreign transaction participants to rollback them in case
+	 * of failure.
+	 */
+}
+
+/*
+ * After PREPARE TRANSACTION, we forget all participants.
+ */
+void
+PostPrepare_FdwXact(void)
+{
+	if (FdwXactParticipants == NIL)
+	{
+		Assert(FdwXactParticipants_tmp == NIL);
+		Assert(!ForeignTwophaseCommitIsRequired);
+		return;
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Collect all foreign transactions associated with the given xid if it's a prepared
+ * transaction.  Return true if COMMIT PREPARED or ROLLBACK PREPARED needs to wait for
+ * all foreign transactions to be resolved.  The collected foreign transactions are kept
+ * in FdwXactParticipants_tmp. The caller must call SetFdwXactParticipants() later
+ * if this function returns true.
+ */
+bool
+PrepareFdwXactParticipants(TransactionId xid)
+{
+	MemoryContext old_ctx;
+
+	Assert(FdwXactParticipants_tmp == NIL);
+
+	if (!TwoPhaseExists(xid))
+		return false;
+
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXactParticipant *fdw_part;
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwRoutine *routine;
+
+		if (!fdwxact->valid || fdwxact->local_xid != xid)
+			continue;
+
+		routine = GetFdwRoutineByServerId(fdwxact->serverid);
+		fdw_part = create_fdwxact_participant(fdwxact->serverid, fdwxact->userid,
+											  routine);
+		fdw_part->modified = true;
+		fdw_part->fdwxact = fdwxact;
+
+		/* Add to the participants list */
+		FdwXactParticipants_tmp = lappend(FdwXactParticipants_tmp, fdw_part);
+	}
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_ctx);
+
+	/*
+	 * We cannot proceed to commit this prepared transaction when
+	 * foreign_twophase_commit is disabled.
+	 */
+	if (FdwXactParticipants_tmp != NIL &&
+		!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a prepared foreign transaction commit when foreign_twophase_commit is \'disabled\'")));
+
+	return (FdwXactParticipants_tmp != NIL);
+}
+
+/*
+ * Set the collected foreign transactions to the participants of this transaction,
+ * and hold them.  This function must be called after CollectFdwXactParticipants().
+ */
+void
+SetFdwXactParticipants(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants_tmp != NIL);
+	Assert(FdwXactParticipants == NIL);
+
+	FdwXactLocalXid = xid;
+	FdwXactParticipants = FdwXactParticipants_tmp;
+	FdwXactParticipants_tmp = NIL;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(SeverSupportTwophaseCommit(fdw_part));
+		Assert(fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED);
+		Assert(fdw_part->fdwxact->locking_backend == InvalidBackendId);
+		Assert(!fdw_part->fdwxact->owner);
+
+		/* Hold the fdwxact entry and set the status */
+		fdw_part->fdwxact->locking_backend = MyBackendId;
+		fdw_part->fdwxact->owner = MyProc;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for its all foreign transactions to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitForResolution(TransactionId wait_xid, bool commit)
+{
+	ListCell	*lc;
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/*
+	 * Quick exit if either atomic commit is not requested or we don't have
+	 * any participants.
+	 */
+	if (!IsForeignTwophaseCommitRequested() || FdwXactParticipants == NIL)
+		return;
+
+	/* Set foreign transaction status */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(fdw_part->fdwxact->locking_backend == MyBackendId);
+		Assert(fdw_part->fdwxact->owner == MyProc);
+
+		SpinLockAcquire(&(fdw_part->fdwxact->mutex));
+		fdw_part->fdwxact->status = commit
+			? FDWXACT_STATUS_COMMITTING
+			: FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&(fdw_part->fdwxact->mutex));
+	}
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once walsender changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+		{
+			ForgetAllFdwXactParticipants();
+			break;
+		}
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the wal sender processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+				 TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+	bool		found = false;
+
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	/* Initialize variables */
+	*nextResolutionTs_p = -1;
+	*waitXid_p = InvalidTransactionId;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+		{
+			if (proc->fdwXactNextResolutionTs <= now)
+			{
+				/* Found a waiting process */
+				found = true;
+				*waitXid_p = proc->fdwXactWaitXid;
+			}
+			else
+				/* Found a waiting process supposed to be processed later */
+				*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+
+			break;
+		}
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return found ? proc : NULL;
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * In abort case, this function ends foreign transaction participants and possibly
+ * rollback their prepared foreign trasnactions.
+ */
+extern void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+			FdwXact		fdwxact = fdw_part->fdwxact;
+			int			status;
+
+			if (!fdwxact)
+			{
+				/*
+				 * We rollback the foreign transaction if its foreign server
+				 * supports transaction callbacks. Otherwise we just delete
+				 * the entry from the list.
+				 */
+				if (ServerSupportTransactionCallack(fdw_part))
+					FdwXactParticipantEndTransaction(fdw_part, false);
+
+				FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+				continue;
+			}
+
+			/*
+			 * Abort the foreign transaction.  For participants whose status
+			 * is FDWXACT_STATUS_PREPARING, we close the transaction in
+			 * one-phase. In addition, since we are not sure that the
+			 * preparation has been completed on the foreign server, we also
+			 * attempts to rollback the prepared foreign transaction.  Note
+			 * that it's FDWs responsibility that they tolerate OBJECT_NOT_FOUND
+			 * error in abort case.
+			 */
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
+
+		/*
+		 * Wait for all prepared or possibly-prepared foreign transactions
+		 * to be resolved.
+		 */
+		if (FdwXactParticipants != NIL)
+		{
+			Assert(TransactionIdIsValid(FdwXactLocalXid));
+			FdwXactWaitForResolution(FdwXactLocalXid, false);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
+ */
+void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell   *cell;
+	int			nlefts = 0;
+
+	if (FdwXactParticipants == NIL)
+	{
+		Assert(FdwXactParticipants_tmp == NIL);
+		Assert(!ForeignTwophaseCommitIsRequired);
+		return;
+	}
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		Assert(fdwxact);
+
+		/*
+		 * Unlock the foreign transaction entries.  Note that there is a race
+		 * condition; the FdwXact entries in FdwXactParticipants could be used
+		 * by other backend before we forget in case where the resolver process
+		 * removes the FdwXact entry and other backend reuses it before we
+		 * forget.  So we need to check if the entries are still associated with
+		 * the transaction.  We cannnot use locking_backend to check because the
+		 * entry might be already held by the resolver process.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->valid && fdwxact->local_xid == FdwXactLocalXid)
+		{
+			if (fdwxact->locking_backend == MyBackendId)
+				fdwxact->locking_backend = InvalidBackendId;
+
+			fdwxact->owner = NULL;
+			nlefts++;
+		}
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+	FdwXactParticipants_tmp = NIL;
+	FdwXactLocalXid = InvalidTransactionId;
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Resolve foreign transactions at the give indexes. If 'waiter' is not NULL,
+ * we release the waiter after we resolved all of the given foreign transactions
+ * On failure we re-enqueue the waiting backend after incremented the next
+ * resolution time.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		PG_TRY();
+		{
+			FdwXactResolveOneFdwXact(fdwxact);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			if (waiter)
+			{
+				LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+				if (waiter->fdwXactState == FDWXACT_WAITING)
+				{
+					SHMQueueDelete(&(waiter->fdwXactLinks));
+					pg_write_barrier();
+					waiter->fdwXactNextResolutionTs =
+						TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+													foreign_xact_resolution_retry_interval);
+					FdwXactQueueInsert(waiter);
+				}
+				LWLockRelease(FdwXactResolutionLock);
+			}
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	if (!waiter)
+		return;
+
+	/*
+	 * Remove waiter from shmem queue, if not detached yet. The waiter could
+	 * already be detached if user cancelled to wait before resolution.
+	 */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx != -1);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Determine whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactDetermineTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.  Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/*
+ * Commit or rollback one prepared foreign transaction.  After resolved
+ * successfully, the FdwXact entry is removed from the shared memory and also
+ * remove the corresponding on-disk file.
+ */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	Assert(fdwxact != NULL);
+	/*
+	 * The FdwXact entry must be either held by a backend or being processed
+	 * by a resolver process.
+	 */
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactDetermineTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare resolution state to pass to API */
+	state.xid = fdwxact->local_xid;
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.  The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that prepared
+	 * this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->owner = NULL;
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "prepared (commit)";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "prepared (abort)";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = BoolGetDatum(fdwxact->owner == NULL);
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId || fdwxact->owner)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction entry whose local transaction is prepared"),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	FdwXactCtl->fdwxacts[idx]->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1, NULL);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId || fdwxact->owner)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..a1a41404c7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,558 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * A resolver process resolves the foreign transactions that are
+		 * waiting for resolution or are not being processed by anyone.
+		 * But we don't need to launch a resolver for foreign transactions
+		 * whose local transaction is prepared.
+		 */
+		if ((!fdwxact->owner && !TwoPhaseExists(fdwxact->local_xid)) ||
+			(fdwxact->owner && fdwxact->owner->fdwXactState == FDWXACT_WAITING))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..3e9ad7a215
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,443 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int			foreign_xact_resolution_retry_interval;
+int			foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_fdwxacts(PGPROC *waiter);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. We clear that flag from those entries on failure.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+static bool processing_online = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If the resolver exits during processing online transactions,
+	 * there might be other waiting online transactions. So request to
+	 * re-launch.
+	 */
+	if (processing_online)
+		FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or the queue has
+		 * only waiters that have a future resolution timestamp.
+		 */
+		processing_online = true;
+		for (;;)
+		{
+			PGPROC	   *waiter;
+
+			CHECK_FOR_INTERRUPTS();
+
+			LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+
+			waiter = FdwXactGetWaiter(now, &resolutionTs, &waitXid);
+
+			if (!waiter)
+			{
+				/* Not found, break */
+				LWLockRelease(FdwXactResolutionLock);
+				break;
+			}
+
+			/* Hold the waiting foreign transactions */
+			hold_fdwxacts(waiter);
+			Assert(nheld > 0);
+			LWLockRelease(FdwXactResolutionLock);
+
+			/* Resolve the waiting distributed transaction */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, waiter);
+			CommitTransactionCommand();
+
+			last_resolution_time = now;
+		}
+		processing_online = false;
+
+		/* Hold indoubt foreign transactions */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, NULL);
+			CommitTransactionCommand();
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		/* There is no waiting backend */
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * FdwXactWaiterExists check.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Take foreign transactions whose local transaction is already finished.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* Take entry if not processed by anyone */
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!fdwxact->owner &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* Take over the entry */
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Take foreign transactions associated with the given waiter's transaction
+ * as in-processing.
+ */
+static void
+hold_fdwxacts(PGPROC *waiter)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->local_xid == waiter->fdwXactWaitXid)
+		{
+			Assert(fdwxact->owner->fdwXactState == FDWXACT_WAITING);
+			Assert(fdwxact->dbid == waiter->databaseId);
+
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 54fb6cc047..6ff79e7b59 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2196,6 +2226,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	XLogRecPtr	recptr;
 	TimestampTz committs = GetCurrentTimestamp();
 	bool		replorigin;
+	bool		need_fdwxact_commit;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Are we using the replication origins feature?  Or, in other words, are
@@ -2266,6 +2303,17 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be committed.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid);
+		FdwXactWaitForResolution(xid, true);
+		ForgetAllFdwXactParticipants();
+	}
 }
 
 /*
@@ -2285,6 +2333,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 							   const char *gid)
 {
 	XLogRecPtr	recptr;
+	bool		need_fdwxact_commit;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Catch the scenario where we aborted partway through
@@ -2325,6 +2380,17 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * Wait for foreign transaction prepared as part of this prepared
+	 * transaction to be rolled back.
+	 */
+	if (need_fdwxact_commit)
+	{
+		SetFdwXactParticipants(xid);
+		FdwXactWaitForResolution(xid, false);
+		ForgetAllFdwXactParticipants();
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cd30b62d36..348d020249 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1219,6 +1220,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1227,6 +1229,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1265,12 +1268,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1428,6 +1432,14 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	/*
+	 * Wait for prepared foreign transaction to be resolved, if required.
+	 * We only want to wait if we prepared foreign transaction in this
+	 * transaction.
+	 */
+	if (need_commit_globally && markXidCommitted)
+		FdwXactWaitForResolution(xid, true);
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2087,6 +2099,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2254,6 +2269,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2341,6 +2357,8 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2532,6 +2550,9 @@ PrepareTransaction(void)
 	 */
 	PostPrepare_Twophase();
 
+	/* Release held FdwXact entries */
+	PostPrepare_FdwXact();
+
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
@@ -2751,6 +2772,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXact(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 55cac186dc..3449aa524a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4599,6 +4600,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6286,6 +6288,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6836,14 +6841,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7045,7 +7051,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7558,6 +7567,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7888,6 +7898,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9183,6 +9196,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9712,8 +9726,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9731,6 +9747,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9749,6 +9766,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9956,6 +9974,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10159,6 +10178,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..d1fded29ab 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..a1dea253c2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2807,8 +2807,14 @@ CopyFrom(CopyState cstate)
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc),
+							   true);
+
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..fd9be68abe 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1078,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1410,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
@@ -1526,6 +1549,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
 				 errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA",
 						fdw->fdwname)));
 
+	/*
+	 * Remember the transaction accesses to a foreign server. Normally during
+	 * ImportForeignSchema we don't modify data on foreign servers, so remember it
+	 * as not-modified server.
+	 */
+	RegisterFdwXactByServerId(server->serverid, false);
+
 	/* Call FDW to get a list of commands */
 	cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..3fa8bfe09f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
@@ -939,7 +940,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+	{
+		Relation		child = partRelInfo->ri_RelationDesc;
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(RelationGetRelid(child), true);
+
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..29f376e48c 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
@@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	 * Tell the FDW to initialize the scan.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		RangeTblEntry	*rte;
+
+		rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex,
+							estate);
+
+		/* Remember the transaction modifies data on a foreign server*/
+		RegisterFdwXactByRelId(rte->relid, true);
+
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
+	{
+		RangeTblEntry	*rte;
+		int rtindex = (scanrelid > 0) ?
+			scanrelid :
+			bms_next_member(node->fs_relids, -1);
+
+		rte = exec_rt_fetch(rtindex, estate);
+
+		/* Remember the transaction accesses to a foreign server */
+		RegisterFdwXactByRelId(rte->relid, false);
+
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1ec07bad07..e5dee94764 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
@@ -2418,6 +2420,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+			/* Remember the transaction modifies data on a foreign server*/
+			RegisterFdwXactByRelId(relid, true);
 
 			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 															 resultRelInfo,
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2258424e81 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e96134dac8..4c15d7481a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3663,6 +3663,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3773,6 +3779,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATE:
 			event_name = "HashBatchAllocate";
 			break;
@@ -4099,6 +4108,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b4d475bb0b..803ac09937 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -973,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..9d34817f39 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 3c2b369615..56c43cf741 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -94,6 +94,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -249,6 +251,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1311,6 +1314,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1376,6 +1380,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1425,6 +1430,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3125,6 +3139,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6985e8eed..241b099238 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 XactTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e57fcd2538..470d0da3d1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c9424f167c..f6da103fbd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 75fc6f11d6..72fe0a7167 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -27,6 +27,7 @@
 #endif
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -426,6 +427,24 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -754,6 +773,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2452,6 +2475,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4580,6 +4649,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3a25287a39..5ed8617787 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -125,6 +125,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -344,6 +346,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 786672b1b6..bc0c12b3b8 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e73639df74..3041c39bc0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 233441837f..b040202043 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..d550ee9b87
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,164 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+	PGPROC		*owner;			/* process that executed the distributed tx. */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	TransactionId xid;
+
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void RegisterFdwXactByRelId(Oid relid, bool modified);
+extern void RegisterFdwXactByServerId(Oid serverid, bool modified);
+extern void ForgetAllFdwXactParticipants(void);
+extern void FdwXactReleaseWaiter(PGPROC *waiter);
+extern void FdwXactWaitForResolution(TransactionId wait_xid, bool commit);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter);
+extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+								TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern bool PrepareFdwXactParticipants(TransactionId xid);
+extern void SetFdwXactParticipants(TransactionId xid);
+extern void ClearFdwXactParticipants(void);
+extern void PreCommit_FdwXact(void);
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void AtPrepare_FdwXact(void);
+extern void PostPrepare_FdwXact(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index c096120c94..7a5d00ddb9 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -108,6 +108,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 3)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c8869d5226..da0d442f1b 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -232,6 +232,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..90bf2d495b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5981,6 +5981,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6099,6 +6117,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c55dc1481c..2186c1c5d0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -806,6 +806,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,6 +855,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_FDWXACT_RESOLUTION,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
 	WAIT_EVENT_HASH_BATCH_LOAD,
@@ -969,6 +972,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index b20e2ad4f6..5bc4c78ace 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -161,6 +162,16 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int			fdwXactState;	/* wait state for foreign transaction resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 454c2df487..f977ca43d4 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b813e32215..d658791549 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1342,6 +1342,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.23.0

v23-0002-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v23-0002-Recreate-RemoveForeignServerById.patchDownload
From cf6c1fc0affdebc170f5ed10b63523d56a4d4c0c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v23 2/7] Recreate RemoveForeignServerById()

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b33a2f94af..637269281b 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1474,6 +1474,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1488,7 +1492,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c26a102b17..89db18b7bc 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.23.0

v23-0001-Keep-track-of-writing-on-non-temporary-relation.patchapplication/octet-stream; name=v23-0001-Keep-track-of-writing-on-non-temporary-relation.patchDownload
From 18bb6668866a992c3dea7bdfebe0cb3db825ab39 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 14:12:17 +0500
Subject: [PATCH v23 1/7] Keep track of writing on non-temporary relation

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/executor/nodeModifyTable.c | 16 ++++++++++++++++
 src/include/access/xact.h              |  6 ++++++
 2 files changed, 22 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..1ec07bad07 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -581,6 +581,10 @@ ExecInsert(ModifyTableState *mtstate,
 										   NULL,
 										   specToken);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
 												   &specConflict,
@@ -619,6 +623,10 @@ ExecInsert(ModifyTableState *mtstate,
 							   estate->es_output_cid,
 							   0, NULL);
 
+			/* Make note that we've wrote on non-temporary relation */
+			if (RelationNeedsWAL(resultRelationDesc))
+				MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
@@ -970,6 +978,10 @@ ldelete:;
 	if (tupleDeleted)
 		*tupleDeleted = true;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/*
 	 * If this delete is the result of a partition key update that moved the
 	 * tuple to a new partition, put this row into the transition OLD TABLE,
@@ -1482,6 +1494,10 @@ lreplace:;
 	if (canSetTag)
 		(estate->es_processed)++;
 
+	/* Make note that we've wrote on non-temporary relation */
+	if (RelationNeedsWAL(resultRelationDesc))
+		MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;
+
 	/* AFTER ROW UPDATE Triggers */
 	ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, slot,
 						 recheckIndexes,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 88025b1cc2..c096120c94 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,12 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary
+ * relation.
+ */
+#define XACT_FLAGS_WROTENONTEMPREL				(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
-- 
2.23.0

#82Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#81)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 23, 2020 at 9:03 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch.

Please review it.

I think at this stage it is important that we do some study of various
approaches to achieve this work and come up with a comparison of the
pros and cons of each approach (a) what this patch provides, (b) what
is implemented in Global Snapshots patch [1]/messages/by-id/20200622150636.GB28999@momjian.us, (c) if possible, what is
implemented in Postgres-XL. I fear that if go too far in spending
effort on this and later discovered that it can be better done via
some other available patch/work (maybe due to a reasons like that
approach can easily extended to provide atomic visibility or the
design is more robust, etc.) then it can lead to a lot of rework.

[1]: /messages/by-id/20200622150636.GB28999@momjian.us

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#83Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Amit Kapila (#82)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 23 Jun 2020 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 23, 2020 at 9:03 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch.

Please review it.

I think at this stage it is important that we do some study of various
approaches to achieve this work and come up with a comparison of the
pros and cons of each approach (a) what this patch provides, (b) what
is implemented in Global Snapshots patch [1], (c) if possible, what is
implemented in Postgres-XL. I fear that if go too far in spending
effort on this and later discovered that it can be better done via
some other available patch/work (maybe due to a reasons like that
approach can easily extended to provide atomic visibility or the
design is more robust, etc.) then it can lead to a lot of rework.

Yeah, I have no objection to that plan but I think we also need to
keep in mind that (b), (c), and whatever we are thinking about global
consistency are talking about only PostgreSQL (and postgres_fdw). On
the other hand, this patch needs to implement the feature that can
resolve the atomic commit problem more generically, because the
foreign server might be using oracle_fdw, mysql_fdw, or other FDWs
connecting database systems supporting 2PC.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#84Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#83)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 26, 2020 at 10:50 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 23 Jun 2020 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think at this stage it is important that we do some study of various
approaches to achieve this work and come up with a comparison of the
pros and cons of each approach (a) what this patch provides, (b) what
is implemented in Global Snapshots patch [1], (c) if possible, what is
implemented in Postgres-XL. I fear that if go too far in spending
effort on this and later discovered that it can be better done via
some other available patch/work (maybe due to a reasons like that
approach can easily extended to provide atomic visibility or the
design is more robust, etc.) then it can lead to a lot of rework.

Yeah, I have no objection to that plan but I think we also need to
keep in mind that (b), (c), and whatever we are thinking about global
consistency are talking about only PostgreSQL (and postgres_fdw).

I think we should explore if those approaches could be extended for
FDWs and if not then that could be considered as a disadvantage of
that approach.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#85Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Tatsuo Ishii (#77)
Re: Transactions involving multiple postgres foreign servers, take 2

The point is this data inconsistency is
lead by an inconsistent read but not by an inconsistent commit
results. I think there are kinds of possibilities causing data
inconsistency but atomic commit and atomic visibility eliminate
different possibilities. We can eliminate all possibilities of data
inconsistency only after we support 2PC and globally MVCC.

IMO any permanent data inconsistency is a serious problem for users no
matter what the technical reasons are.

I have incorporated "Pangea" algorithm into Pgpool-II to implement the
atomic visibility. In a test below I have two PostgreSQL servers
(stock v12), server0 (port 11002) and server1 (port
11003). default_transaction_isolation was set to 'repeatable read' on
both PostgreSQL, this is required by Pangea. Pgpool-II replicates
write queries and send them to both server0 and server1. There are two
tables "t1" (having only 1 integer column "i") and "log" (having only
1 integer c column "i"). I have run following script
(inconsistency1.sql) via pgbench:

BEGIN;
UPDATE t1 SET i = i + 1;
END;

like: pgbench -n -c 1 -T 30 -f inconsistency1.sql

In the moment I have run another session from pgbench concurrently:

BEGIN;
INSERT INTO log SELECT * FROM t1;
END;

pgbench -n -c 1 -T 30 -f inconsistency2.sql

After finishing those two pgbench runs, I ran following COPY to see if
contents of table "log" are identical in server0 and server1:
psql -p 11002 -c "\copy log to '11002.txt'"
psql -p 11003 -c "\copy log to '11003.txt'"
cmp 11002.txt 11003.txt

The new Pgpool-II incorporating Pangea showed that 11002.txt and
11003.txt are identical as expected. This indicates that the atomic
visibility are kept.

On the other hand Pgpool-II which does not implement Pangea showed
differences in those files.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#86Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Tatsuo Ishii (#85)
Re: Transactions involving multiple postgres foreign servers, take 2

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!
I have three questions about the v23 patches.

1. messages related to user canceling

In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.

A. When prepare is failed, the output shows that
committed locally but some error is occurred.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
ERROR: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT: remote SQL command: PREPARE TRANSACTION
'fx_1020791818_519_16399_10'
```

B. When prepare is succeeded,
the output show that committed locally.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
COMMIT
```

In case of A, I think that "committed locally" message can confuse user.
Because although messages show committed but the transaction is
"ABORTED".

I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?

In case of A, it's better to change message for user friendly, isn't it?

2. typo

Is trasnactions in fdwxact.c typo?

3. FdwXactGetWaiter in fdwxact.c return unused value

FdwXactGetWaiter is called in FXRslvLoop function.
It returns *waitXid_p, but FXRslvloop doesn't seem to
use *waitXid_p. Do we need to return it?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#87Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiro Ikeda (#86)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

2. With the patch, when INSERT/UPDATE/DELETE are executed both in
local and remote servers, 2PC is executed at the commit phase. But
when write SQL (e.g., TRUNCATE) except INSERT/UPDATE/DELETE are
executed in local and INSERT/UPDATE/DELETE are executed in remote,
2PC is NOT executed. Is this safe?

3. XACT_FLAGS_WROTENONTEMPREL is set when INSERT/UPDATE/DELETE
are executed. But it's not reset even when those queries are canceled by
ROLLBACK TO SAVEPOINT. This may cause unnecessary 2PC at the commit phase.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#88Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Fujii Masao (#87)
Re: Transactions involving multiple postgres foreign servers, take 2

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

I want to ask a question about streaming replication with 2PC.
Are you going to support 2PC with streaming replication?

I tried streaming replication using v23 patches.
I confirm that 2PC works with streaming replication,
which there are primary/standby coordinator.

But, in my understanding, the WAL of "PREPARE" and
"COMMIT/ABORT PREPARED" can't be replicated to the standby server in
sync.

If this is right, the unresolved transaction can be occurred.

For example,

1. PREPARE is done
2. crash primary before the WAL related to PREPARE is
replicated to the standby server
3. promote standby server // but can't execute "ABORT PREPARED"

In above case, the remote server has the unresolved transaction.
Can we solve this problem to support in-sync replication?

But, I think some users use async replication for performance.
Do we need to document the limitation or make another solution?

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION

#89Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiro Ikeda (#86)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!
I have three questions about the v23 patches.

1. messages related to user canceling

In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.

A. When prepare is failed, the output shows that
committed locally but some error is occurred.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
ERROR: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT: remote SQL command: PREPARE TRANSACTION
'fx_1020791818_519_16399_10'
```

B. When prepare is succeeded,
the output show that committed locally.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
COMMIT
```

In case of A, I think that "committed locally" message can confuse user.
Because although messages show committed but the transaction is
"ABORTED".

I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?

No, you're right. I'll fix it in the next version patch.

I think synchronous replication also has the same problem. It says
"the transaction has already committed" but it's not true when
executing ROLLBACK PREPARED.

BTW how did you test the case (A)? It says canceling wait for foreign
transaction resolution but the remote SQL command is PREPARE
TRANSACTION.

In case of A, it's better to change message for user friendly, isn't it?

2. typo

Is trasnactions in fdwxact.c typo?

Fixed.

3. FdwXactGetWaiter in fdwxact.c return unused value

FdwXactGetWaiter is called in FXRslvLoop function.
It returns *waitXid_p, but FXRslvloop doesn't seem to
use *waitXid_p. Do we need to return it?

Removed.

I've incorporated the above your comments in the local branch. I'll
post the latest version patch after incorporating other comments soon.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#90Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#89)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/07/15 15:06, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!
I have three questions about the v23 patches.

1. messages related to user canceling

In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.

A. When prepare is failed, the output shows that
committed locally but some error is occurred.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
ERROR: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT: remote SQL command: PREPARE TRANSACTION
'fx_1020791818_519_16399_10'
```

B. When prepare is succeeded,
the output show that committed locally.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
COMMIT
```

In case of A, I think that "committed locally" message can confuse user.
Because although messages show committed but the transaction is
"ABORTED".

I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?

No, you're right. I'll fix it in the next version patch.

I think synchronous replication also has the same problem. It says
"the transaction has already committed" but it's not true when
executing ROLLBACK PREPARED.

Yes. Also the same message is logged when executing PREPARE TRANSACTION.
Maybe it should be changed to "the transaction has already prepared".

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#91Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#89)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020-07-15 15:06, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikedamsh@oss.nttdata.com>
wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!
I have three questions about the v23 patches.

1. messages related to user canceling

In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.

A. When prepare is failed, the output shows that
committed locally but some error is occurred.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
ERROR: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT: remote SQL command: PREPARE TRANSACTION
'fx_1020791818_519_16399_10'
```

B. When prepare is succeeded,
the output show that committed locally.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING: canceling wait for resolving foreign transaction due to user
request
DETAIL: The transaction has already committed locally, but might not
have been committed on the foreign server.
COMMIT
```

In case of A, I think that "committed locally" message can confuse
user.
Because although messages show committed but the transaction is
"ABORTED".

I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?

No, you're right. I'll fix it in the next version patch.

I think synchronous replication also has the same problem. It says
"the transaction has already committed" but it's not true when
executing ROLLBACK PREPARED.

Thanks for replying and sharing the synchronous replication problem.

BTW how did you test the case (A)? It says canceling wait for foreign
transaction resolution but the remote SQL command is PREPARE
TRANSACTION.

I think the timing of failure is important for 2PC test.
Since I don't have any good solution to simulate those flexibly,
I use the GDB debugger.

The message of the case (A) is sent
after performing the following operations.

1. Attach the debugger to a backend process.
2. Set a breakpoint to PreCommit_FdwXact() in CommitTransaction().
// Before PREPARE.
3. Execute "BEGIN" and insert data into two remote foreign tables.
4. Issue a "Commit" command
5. The backend process stops at the breakpoint.
6. Stop a remote foreign server.
7. Detach the debugger.
// The backend continues and prepare is failed. TR try to abort all
remote txs.
// It's unnecessary to resolve remote txs which prepare is failed,
isn't it?
8. Send a cancel request.

BTW, I concerned that how to test the 2PC patches.
There are many failure patterns, such as failure timing,
failure server/nw (and unexpected recovery), and those combinations...

Though it's best to test those failure patterns automatically,
I have no idea for now, so I manually check some patterns.

I've incorporated the above your comments in the local branch. I'll
post the latest version patch after incorporating other comments soon.

OK, Thanks.

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION

#92Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiro Ikeda (#88)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

I want to ask a question about streaming replication with 2PC.
Are you going to support 2PC with streaming replication?

I tried streaming replication using v23 patches.
I confirm that 2PC works with streaming replication,
which there are primary/standby coordinator.

But, in my understanding, the WAL of "PREPARE" and
"COMMIT/ABORT PREPARED" can't be replicated to the standby server in
sync.

If this is right, the unresolved transaction can be occurred.

For example,

1. PREPARE is done
2. crash primary before the WAL related to PREPARE is
replicated to the standby server
3. promote standby server // but can't execute "ABORT PREPARED"

In above case, the remote server has the unresolved transaction.
Can we solve this problem to support in-sync replication?

But, I think some users use async replication for performance.
Do we need to document the limitation or make another solution?

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes. To prevent this
issue, I think we would need to send each WAL records in sync but I'm
not sure it's reasonable behavior, and as long as we write WAL in the
local and then send it to replicas we would need a smart mechanism to
prevent this situation.

Related to the pointing out by Ikeda-san, I realized that with the
current patch the backend waits for synchronous replication and then
waits for foreign transaction resolution. But it should be reversed.
Otherwise, it could lead to data loss even when the client got an
acknowledgment of commit. Also, when the user is using both atomic
commit and synchronous replication and wants to cancel waiting, he/she
will need to press ctl-c twice with the current patch, which also
should be fixed.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#93tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#89)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi Sawada san,

I'm reviewing this patch series, and let me give some initial comments and questions. I'm looking at this with a hope that this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be used for Postgres scale-out.

I don't think it's necessarily required to combine 2PC with the global visibility. X/Open XA specification only handles the atomic commit. The only part in the XA specification that refers to global visibility is the following:

[Quote from XA specification]
--------------------------------------------------
2.3.2 Protocol Optimisations
・ Read-only
An RM can respond to the TM’s prepare request by asserting that the RM was not
asked to update shared resources in this transaction branch. This response
concludes the RM’s involvement in the transaction; the Phase 2 dialogue between
the TM and this RM does not occur. The TM need not stably record, in its list of
participating RMs, an RM that asserts a read-only role in the global transaction.

However, if the RM returns the read-only optimisation before all work on the global
transaction is prepared, global serialisability1 cannot be guaranteed. This is because
the RM may release transaction context, such as read locks, before all application
activity for that global transaction is finished.

1.
Serialisability is a property of a set of concurrent transactions. For a serialisable set of transactions, at least one
serial sequence of the transactions exists that produces identical results, with respect to shared resources, as does
concurrent execution of the transaction.
--------------------------------------------------

(1)
Do other popular DBMSs (Oracle, MySQL, etc.) provide concrete functions that can be used for the new FDW commit/rollback/prepare API? I'm asking this to confirm that we really need to provide these functions, not as the transaction callbacks for postgres_fdw.

(2)
How are data modifications tracked in local and remote transactions? 0001 seems to handle local INSERT/DELETE/UPDATE. Especially:

* COPY FROM to local/remote tables/views.

* User-defined function calls that modify data, e.g. SELECT func1() WHERE col = func2()

(3)
Does the 2PC processing always go through the background worker?
Is the group commit effective on the remote server? That is, PREPARE and COMMIT PREPARED issued from multiple remote sessions are written to WAL in batch?

Regards
Takayuki Tsunakawa

#94Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Fujii Masao (#87)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

2. With the patch, when INSERT/UPDATE/DELETE are executed both in
local and remote servers, 2PC is executed at the commit phase. But
when write SQL (e.g., TRUNCATE) except INSERT/UPDATE/DELETE are
executed in local and INSERT/UPDATE/DELETE are executed in remote,
2PC is NOT executed. Is this safe?

Hmm, you're right. I think atomic commit must be used also when the
user executes other write SQLs such as TRUNCATE, COPY, CLUSTER, and
CREATE TABLE on the local node.

3. XACT_FLAGS_WROTENONTEMPREL is set when INSERT/UPDATE/DELETE
are executed. But it's not reset even when those queries are canceled by
ROLLBACK TO SAVEPOINT. This may cause unnecessary 2PC at the commit phase.

Will fix.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#95Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#92)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020-07-16 13:16, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com>
wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

I want to ask a question about streaming replication with 2PC.
Are you going to support 2PC with streaming replication?

I tried streaming replication using v23 patches.
I confirm that 2PC works with streaming replication,
which there are primary/standby coordinator.

But, in my understanding, the WAL of "PREPARE" and
"COMMIT/ABORT PREPARED" can't be replicated to the standby server in
sync.

If this is right, the unresolved transaction can be occurred.

For example,

1. PREPARE is done
2. crash primary before the WAL related to PREPARE is
replicated to the standby server
3. promote standby server // but can't execute "ABORT PREPARED"

In above case, the remote server has the unresolved transaction.
Can we solve this problem to support in-sync replication?

But, I think some users use async replication for performance.
Do we need to document the limitation or make another solution?

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes.

Thanks! I didn't noticed the behavior when a user executes PREPARE
TRANSACTION is same.

IIUC with 2PC, there is a different point between (1)PREPARE TRANSACTION
and (2)2PC.
The point is that whether the client can know when the server crashed
and it's global tx id.

If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
command
because if the remote server is already prepared the command will be
ignored.

But, if (2)2PC is failed with coordinator crash, the client can't know
what operations should be done.

If the old coordinator already executed PREPARED, there are some
transaction which should be ABORT PREPARED.
But if the PREPARED WAL is not sent to the standby, the new coordinator
can't execute ABORT PREPARED.
And the client can't know which remote servers have PREPARED
transactions which should be ABORTED either.

Even if the client can know that, only the old coordinator knows its
global transaction id.
Only the database administrator can analyze the old coordinator's log
and then execute the appropriate commands manually, right?

To prevent this
issue, I think we would need to send each WAL records in sync but I'm
not sure it's reasonable behavior, and as long as we write WAL in the
local and then send it to replicas we would need a smart mechanism to
prevent this situation.

I agree. To send each 2PC WAL records in sync must be with a large
performance impact.
At least, we need to document the limitation and how to handle this
situation.

Related to the pointing out by Ikeda-san, I realized that with the
current patch the backend waits for synchronous replication and then
waits for foreign transaction resolution. But it should be reversed.
Otherwise, it could lead to data loss even when the client got an
acknowledgment of commit. Also, when the user is using both atomic
commit and synchronous replication and wants to cancel waiting, he/she
will need to press ctl-c twice with the current patch, which also
should be fixed.

I'm sorry that I can't understood.

In my understanding, if COMMIT WAL is replicated to the standby in sync,
the standby server can resolve the transaction after crash recovery in
promoted phase.

If reversed, there are some situation which can't guarantee atomic
commit.
In case that some foreign transaction resolutions are succeed but others
are failed(and COMMIT WAL is not replicated),
the standby must ABORT PREPARED because the COMMIT WAL is not
replicated.
This means that some foreign transactions are COMMITE PREPARED executed
by primary coordinator,
other foreign transactions can be ABORT PREPARED executed by secondary
coordinator.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#96Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#93)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 16 Jul 2020 at 13:53, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Hi Sawada san,

I'm reviewing this patch series, and let me give some initial comments and questions. I'm looking at this with a hope that this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be used for Postgres scale-out.

Thank you for reviewing this patch!

Yes, this patch is trying to resolve the generic atomic commit problem
w.r.t. FDW, and will be useful also for Postgres scale-out.

I don't think it's necessarily required to combine 2PC with the global visibility. X/Open XA specification only handles the atomic commit. The only part in the XA specification that refers to global visibility is the following:

[Quote from XA specification]
--------------------------------------------------
2.3.2 Protocol Optimisations
・ Read-only
An RM can respond to the TM’s prepare request by asserting that the RM was not
asked to update shared resources in this transaction branch. This response
concludes the RM’s involvement in the transaction; the Phase 2 dialogue between
the TM and this RM does not occur. The TM need not stably record, in its list of
participating RMs, an RM that asserts a read-only role in the global transaction.

However, if the RM returns the read-only optimisation before all work on the global
transaction is prepared, global serialisability1 cannot be guaranteed. This is because
the RM may release transaction context, such as read locks, before all application
activity for that global transaction is finished.

1.
Serialisability is a property of a set of concurrent transactions. For a serialisable set of transactions, at least one
serial sequence of the transactions exists that produces identical results, with respect to shared resources, as does
concurrent execution of the transaction.
--------------------------------------------------

Agreed.

(1)
Do other popular DBMSs (Oracle, MySQL, etc.) provide concrete functions that can be used for the new FDW commit/rollback/prepare API? I'm asking this to confirm that we really need to provide these functions, not as the transaction callbacks for postgres_fdw.

I have briefly checked the only oracle_fdw but in general I think that
if an existing FDW supports transaction begin, commit, and rollback,
these can be ported to new FDW transaction APIs easily.

Regarding the comparison between FDW transaction APIs and transaction
callbacks, I think one of the benefits of providing FDW transaction
APIs is that the core is able to manage the status of foreign
transactions. We need to track the status of individual foreign
transactions to support atomic commit. If we use transaction callbacks
(XactCallback) that many FDWs are using, I think we will end up
calling the transaction callback and leave the transaction work to
FDWs, leading that the core is not able to know the return values of
PREPARE TRANSACTION for example. We can add more arguments passed to
transaction callbacks to get the return value from FDWs but I don’t
think it’s a good idea as transaction callbacks are used not only by
FDW but also other external modules.

(2)
How are data modifications tracked in local and remote transactions? 0001 seems to handle local INSERT/DELETE/UPDATE. Especially:

* COPY FROM to local/remote tables/views.

* User-defined function calls that modify data, e.g. SELECT func1() WHERE col = func2()

With the current version patch (v23), it supports only
INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
supports other writes SQLs as Fujii-san also pointed out.

(3)
Does the 2PC processing always go through the background worker?
Is the group commit effective on the remote server? That is, PREPARE and COMMIT PREPARED issued from multiple remote sessions are written to WAL in batch?

No, in the current design, the backend who received a query from the
client does PREPARE, and then the transaction resolver process, a
background worker, does COMMIT PREPARED.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#97tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#96)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
I have briefly checked the only oracle_fdw but in general I think that

if an existing FDW supports transaction begin, commit, and rollback,
these can be ported to new FDW transaction APIs easily.

Does oracle_fdw support begin, commit and rollback?

And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples.

What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will be riddled with functions only for Postgres. That is, the FDW interface is getting away from its original purpose "access external data as a relation" and complex. Tomas Vondra showed this concern as follows:

Horizontal scalability/sharding
/messages/by-id/CANP8+jK=+3zVYDFY0oMAQKQVJ+qReDHr1UPdyFEELO82yVfb9A@mail.gmail.com

[Tomas Vondra's remarks]
--------------------------------------------------

This strikes me as a bit of a conflict of interest with FDW which
seems to want to hide the fact that it's foreign; the FDW
implementation makes it's own optimization decisions which might
make sense for single table queries but breaks down in the face of
joins.

+1 to these concerns

In my mind, FDW is a wonderful tool to integrate PostgreSQL with
external data sources, and it's nicely shaped for this purpose, which
implies the abstractions and assumptions in the code.

The truth however is that many current uses of the FDW API are actually
using it for different purposes because there's no other way to do that,
not because FDWs are the "right way". And this includes the attempts to
build sharding on FDW, I think.

Situations like this result in "improvements" of the API that seem to
improve the API for the second group, but make the life harder for the
original FDW API audience by making the API needlessly complex. And I
say "seem to improve" because the second group eventually runs into the
fundamental abstractions and assumptions the API is based on anyway.

And based on the discussions at pgcon, I think this is the main reason
why people cringe when they hear "FDW" and "sharding" in the same sentence.

...
My other worry is that we'll eventually mess the FDW infrastructure,
making it harder to use for the original purpose. Granted, most of the
improvements proposed so far look sane and useful for FDWs in general,
but sooner or later that ceases to be the case - there sill be changes
needed merely for the sharding. Those will be tough decisions.
--------------------------------------------------

Regarding the comparison between FDW transaction APIs and transaction
callbacks, I think one of the benefits of providing FDW transaction
APIs is that the core is able to manage the status of foreign
transactions. We need to track the status of individual foreign
transactions to support atomic commit. If we use transaction callbacks
(XactCallback) that many FDWs are using, I think we will end up
calling the transaction callback and leave the transaction work to
FDWs, leading that the core is not able to know the return values of
PREPARE TRANSACTION for example. We can add more arguments passed to
transaction callbacks to get the return value from FDWs but I don’t
think it’s a good idea as transaction callbacks are used not only by
FDW but also other external modules.

To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can't we?

With the current version patch (v23), it supports only
INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
supports other writes SQLs as Fujii-san also pointed out.

OK. I've just read that Fujii san already pointed out a similar thing. But I wonder if we can know that the UDF executed on the foreign server has updated data. Maybe we can know or guess it by calling txid_current_if_any() or checking the transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw?

No, in the current design, the backend who received a query from the
client does PREPARE, and then the transaction resolver process, a
background worker, does COMMIT PREPARED.

This "No" means the current implementation cannot group commits from multiple transactions?
Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one? For example,

[local server]
Transaction T1 and T2 performs 2PC at the same time.
Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response.
T1 writes COMMIT PREPARED record locally and sync the WAL.
Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response.
T2 writes COMMIT PREPARED record locally and sync the WAL.

[foreign server]
T1 writes COMMIT PREPARED record locally and sync the WAL.
T2 writes COMMIT PREPARED record locally and sync the WAL.

If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't take effect, then the OLTP transaction performance will be unacceptable.

Regards
Takayuki Tsunakawa

#98Laurenz Albe
laurenz.albe@cybertec.at
In reply to: tsunakawa.takay@fujitsu.com (#97)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 2020-07-17 at 05:21 +0000, tsunakawa.takay@fujitsu.com wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
I have briefly checked the only oracle_fdw but in general I think that

if an existing FDW supports transaction begin, commit, and rollback,
these can be ported to new FDW transaction APIs easily.

Does oracle_fdw support begin, commit and rollback?

Yes.

And most importantly, do other major DBMSs, including Oracle, provide the API for
preparing a transaction? In other words, will the FDWs other than postgres_fdw
really be able to take advantage of the new FDW functions to join the 2PC processing?
I think we need to confirm that there are concrete examples.

I bet they do. There is even a standard for that.

I am not looking forward to adapting oracle_fdw, and I didn't read the patch.

But using distributed transactions is certainly a good thing if it is done right.

The trade off is the need for a transaction manager, and implementing that
correctly is a high price to pay.

Yours,
Laurenz Albe

#99Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiro Ikeda (#95)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2020-07-16 13:16, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com>
wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

I want to ask a question about streaming replication with 2PC.
Are you going to support 2PC with streaming replication?

I tried streaming replication using v23 patches.
I confirm that 2PC works with streaming replication,
which there are primary/standby coordinator.

But, in my understanding, the WAL of "PREPARE" and
"COMMIT/ABORT PREPARED" can't be replicated to the standby server in
sync.

If this is right, the unresolved transaction can be occurred.

For example,

1. PREPARE is done
2. crash primary before the WAL related to PREPARE is
replicated to the standby server
3. promote standby server // but can't execute "ABORT PREPARED"

In above case, the remote server has the unresolved transaction.
Can we solve this problem to support in-sync replication?

But, I think some users use async replication for performance.
Do we need to document the limitation or make another solution?

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes.

Thanks! I didn't noticed the behavior when a user executes PREPARE
TRANSACTION is same.

IIUC with 2PC, there is a different point between (1)PREPARE TRANSACTION
and (2)2PC.
The point is that whether the client can know when the server crashed
and it's global tx id.

If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
command
because if the remote server is already prepared the command will be
ignored.

But, if (2)2PC is failed with coordinator crash, the client can't know
what operations should be done.

If the old coordinator already executed PREPARED, there are some
transaction which should be ABORT PREPARED.
But if the PREPARED WAL is not sent to the standby, the new coordinator
can't execute ABORT PREPARED.
And the client can't know which remote servers have PREPARED
transactions which should be ABORTED either.

Even if the client can know that, only the old coordinator knows its
global transaction id.
Only the database administrator can analyze the old coordinator's log
and then execute the appropriate commands manually, right?

I think that's right. In the case of the coordinator crash, the user
can look orphaned foreign prepared transactions by checking the
'identifier' column of pg_foreign_xacts on the new standby server and
the prepared transactions on the remote servers.

To prevent this
issue, I think we would need to send each WAL records in sync but I'm
not sure it's reasonable behavior, and as long as we write WAL in the
local and then send it to replicas we would need a smart mechanism to
prevent this situation.

I agree. To send each 2PC WAL records in sync must be with a large
performance impact.
At least, we need to document the limitation and how to handle this
situation.

Ok. I'll add it.

Related to the pointing out by Ikeda-san, I realized that with the
current patch the backend waits for synchronous replication and then
waits for foreign transaction resolution. But it should be reversed.
Otherwise, it could lead to data loss even when the client got an
acknowledgment of commit. Also, when the user is using both atomic
commit and synchronous replication and wants to cancel waiting, he/she
will need to press ctl-c twice with the current patch, which also
should be fixed.

I'm sorry that I can't understood.

In my understanding, if COMMIT WAL is replicated to the standby in sync,
the standby server can resolve the transaction after crash recovery in
promoted phase.

If reversed, there are some situation which can't guarantee atomic
commit.
In case that some foreign transaction resolutions are succeed but others
are failed(and COMMIT WAL is not replicated),
the standby must ABORT PREPARED because the COMMIT WAL is not
replicated.
This means that some foreign transactions are COMMITE PREPARED executed
by primary coordinator,
other foreign transactions can be ABORT PREPARED executed by secondary
coordinator.

You're right. Thank you for pointing out!

If the coordinator crashes after the client gets acknowledgment of the
successful commit of the transaction but before sending
XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are
left on the replicas even after failover. But since we require FDW to
tolerate the error of undefined prepared transactions in
COMMIT/ROLLBACK PREPARED it won’t be a critical problem.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#100tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Laurenz Albe (#98)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Laurenz Albe <laurenz.albe@cybertec.at>

On Fri, 2020-07-17 at 05:21 +0000, tsunakawa.takay@fujitsu.com wrote:

And most importantly, do other major DBMSs, including Oracle, provide the

API for

preparing a transaction? In other words, will the FDWs other than

postgres_fdw

really be able to take advantage of the new FDW functions to join the 2PC

processing?

I think we need to confirm that there are concrete examples.

I bet they do. There is even a standard for that.

If you're thinking of xa_prepare() defined in the X/Open XA specification, we need to be sure that other FDWs can really utilize this new 2PC mechanism. What I'm especially wondering is when the FDW can call xa_start().

Regards
Takayuki Tsunakawa

#101Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#97)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 17 Jul 2020 at 14:22, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
I have briefly checked the only oracle_fdw but in general I think that

if an existing FDW supports transaction begin, commit, and rollback,
these can be ported to new FDW transaction APIs easily.

Does oracle_fdw support begin, commit and rollback?

And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples.

I also believe they do. But I'm concerned that some FDW needs to start
a transaction differently when using 2PC. For instance, IIUC MySQL
also supports 2PC but the transaction needs to be started with "XA
START id” when the transaction needs to be prepared. The transaction
started with XA START can be closed by XA END followed by XA PREPARE
or XA COMMIT ONE PHASE. It means that when starts a new transaction
the transaction needs to prepare the transaction identifier and to
know that 2PC might be used. It’s quite different from PostgreSQL. In
PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
required when PREPARE TRANSACTION.

With MySQL, I guess FDW needs a way to tell the (next) transaction
needs to be started with XA START so it can be prepared. It could be a
custom GUC or an SQL function. Then when starts a new transaction on
MySQL server, FDW can generate and store a transaction identifier into
somewhere alongside the connection. At the prepare phase, it passes
the transaction identifier via GetPrepareId() API to the core.

I haven’t tested the above yet and it’s just a desk plan. it's
definitely a good idea to try integrating this 2PC feature to FDWs
other than postgres_fdw to see if design and interfaces are
implemented sophisticatedly.

What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will be riddled with functions only for Postgres. That is, the FDW interface is getting away from its original purpose "access external data as a relation" and complex. Tomas Vondra showed this concern as follows:

Horizontal scalability/sharding
/messages/by-id/CANP8+jK=+3zVYDFY0oMAQKQVJ+qReDHr1UPdyFEELO82yVfb9A@mail.gmail.com

[Tomas Vondra's remarks]
--------------------------------------------------

This strikes me as a bit of a conflict of interest with FDW which
seems to want to hide the fact that it's foreign; the FDW
implementation makes it's own optimization decisions which might
make sense for single table queries but breaks down in the face of
joins.

+1 to these concerns

In my mind, FDW is a wonderful tool to integrate PostgreSQL with
external data sources, and it's nicely shaped for this purpose, which
implies the abstractions and assumptions in the code.

The truth however is that many current uses of the FDW API are actually
using it for different purposes because there's no other way to do that,
not because FDWs are the "right way". And this includes the attempts to
build sharding on FDW, I think.

Situations like this result in "improvements" of the API that seem to
improve the API for the second group, but make the life harder for the
original FDW API audience by making the API needlessly complex. And I
say "seem to improve" because the second group eventually runs into the
fundamental abstractions and assumptions the API is based on anyway.

And based on the discussions at pgcon, I think this is the main reason
why people cringe when they hear "FDW" and "sharding" in the same sentence.

...
My other worry is that we'll eventually mess the FDW infrastructure,
making it harder to use for the original purpose. Granted, most of the
improvements proposed so far look sane and useful for FDWs in general,
but sooner or later that ceases to be the case - there sill be changes
needed merely for the sharding. Those will be tough decisions.
--------------------------------------------------

Regarding the comparison between FDW transaction APIs and transaction
callbacks, I think one of the benefits of providing FDW transaction
APIs is that the core is able to manage the status of foreign
transactions. We need to track the status of individual foreign
transactions to support atomic commit. If we use transaction callbacks
(XactCallback) that many FDWs are using, I think we will end up
calling the transaction callback and leave the transaction work to
FDWs, leading that the core is not able to know the return values of
PREPARE TRANSACTION for example. We can add more arguments passed to
transaction callbacks to get the return value from FDWs but I don’t
think it’s a good idea as transaction callbacks are used not only by
FDW but also other external modules.

To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can't we?

I haven't thought such an interface but it sounds like the transaction
status is managed on both the core and FDWs. Could you elaborate on
that?

With the current version patch (v23), it supports only
INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
supports other writes SQLs as Fujii-san also pointed out.

OK. I've just read that Fujii san already pointed out a similar thing. But I wonder if we can know that the UDF executed on the foreign server has updated data. Maybe we can know or guess it by calling txid_current_if_any() or checking the transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw?

Ah, my answer was not enough. It was only about tracking local writes.

Regarding tracking of writes on the foreign server, I think there are
restrictions. Currently, the executor registers a foreign sever as a
participant of 2PC before calling BeginForeignInsert(),
BeginForeignModify(), and BeginForeignScan() etc with a flag
indicating whether writes is going to happen on the foreign server. So
even if an UDF in a SELECT statement that could update data were to be
pushed down to the foreign server, the foreign server would be marked
as *not* modified. I’ve not tested yet but I guess that since FDW also
is allowed to register the foreign server along with that flag anytime
before commit, FDW is able to forcibly change that flag if it knows
the SELECT query is going to modify the data on the remote server.

No, in the current design, the backend who received a query from the
client does PREPARE, and then the transaction resolver process, a
background worker, does COMMIT PREPARED.

This "No" means the current implementation cannot group commits from multiple transactions?

Yes.

Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one? For example,

[local server]
Transaction T1 and T2 performs 2PC at the same time.
Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response.
T1 writes COMMIT PREPARED record locally and sync the WAL.
Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response.
T2 writes COMMIT PREPARED record locally and sync the WAL.

[foreign server]
T1 writes COMMIT PREPARED record locally and sync the WAL.
T2 writes COMMIT PREPARED record locally and sync the WAL.

Just to be clear, the transaction resolver writes FDWXACT_REMOVE
records instead of COMMIT PREPARED record to remove foreign
transaction entry. But, yes, the transaction resolver works like the
above you explained.

If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't take effect, then the OLTP transaction performance will be unacceptable.

I agree that it'll be a large performance penalty. I'd like to have it
but I’m not sure we should have it in the first version from the
perspective of complexity. Since the procedure of 2PC is originally
high cost, in my opinion, the user should not use as much as possible
in terms of performance. Especially in OLTP, its cost will directly
affect the latency. I’d suggest designing database schema so
transaction touches only one foreign server but do you have concrete
OLTP usecase where normally requires 2PC, and how many servers
involved within a distributed transaction?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#102Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#101)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/07/17 20:04, Masahiko Sawada wrote:

On Fri, 17 Jul 2020 at 14:22, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
I have briefly checked the only oracle_fdw but in general I think that

if an existing FDW supports transaction begin, commit, and rollback,
these can be ported to new FDW transaction APIs easily.

Does oracle_fdw support begin, commit and rollback?

And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples.

I also believe they do. But I'm concerned that some FDW needs to start
a transaction differently when using 2PC. For instance, IIUC MySQL
also supports 2PC but the transaction needs to be started with "XA
START id” when the transaction needs to be prepared. The transaction
started with XA START can be closed by XA END followed by XA PREPARE
or XA COMMIT ONE PHASE.

This means that FDW should provide also the API for xa_end()?
Maybe we need to consider again which API we should provide in FDW,
based on XA specification?

It means that when starts a new transaction
the transaction needs to prepare the transaction identifier and to
know that 2PC might be used. It’s quite different from PostgreSQL. In
PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
required when PREPARE TRANSACTION.

With MySQL, I guess FDW needs a way to tell the (next) transaction
needs to be started with XA START so it can be prepared. It could be a
custom GUC or an SQL function. Then when starts a new transaction on
MySQL server, FDW can generate and store a transaction identifier into
somewhere alongside the connection. At the prepare phase, it passes
the transaction identifier via GetPrepareId() API to the core.

I haven’t tested the above yet and it’s just a desk plan. it's
definitely a good idea to try integrating this 2PC feature to FDWs
other than postgres_fdw to see if design and interfaces are
implemented sophisticatedly.

With the current patch, we track whether write queries are executed
in each server. Then, if the number of servers that execute write queries
is less than two, 2PC is skipped. This "optimization" is not necessary
(cannot be applied) when using mysql_fdw because the transaction starts
with XA START. Right?

If that's the "optimization" only for postgres_fdw, maybe it's better to
get rid of that "optimization" from the first patch, to make the patch simpler.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#103Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#94)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#104Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#99)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020-07-17 15:55, Masahiko Sawada wrote:

On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <ikedamsh@oss.nttdata.com>
wrote:

On 2020-07-16 13:16, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com>
wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

I want to ask a question about streaming replication with 2PC.
Are you going to support 2PC with streaming replication?

I tried streaming replication using v23 patches.
I confirm that 2PC works with streaming replication,
which there are primary/standby coordinator.

But, in my understanding, the WAL of "PREPARE" and
"COMMIT/ABORT PREPARED" can't be replicated to the standby server in
sync.

If this is right, the unresolved transaction can be occurred.

For example,

1. PREPARE is done
2. crash primary before the WAL related to PREPARE is
replicated to the standby server
3. promote standby server // but can't execute "ABORT PREPARED"

In above case, the remote server has the unresolved transaction.
Can we solve this problem to support in-sync replication?

But, I think some users use async replication for performance.
Do we need to document the limitation or make another solution?

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes.

Thanks! I didn't noticed the behavior when a user executes PREPARE
TRANSACTION is same.

IIUC with 2PC, there is a different point between (1)PREPARE
TRANSACTION
and (2)2PC.
The point is that whether the client can know when the server crashed
and it's global tx id.

If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
command
because if the remote server is already prepared the command will be
ignored.

But, if (2)2PC is failed with coordinator crash, the client can't know
what operations should be done.

If the old coordinator already executed PREPARED, there are some
transaction which should be ABORT PREPARED.
But if the PREPARED WAL is not sent to the standby, the new
coordinator
can't execute ABORT PREPARED.
And the client can't know which remote servers have PREPARED
transactions which should be ABORTED either.

Even if the client can know that, only the old coordinator knows its
global transaction id.
Only the database administrator can analyze the old coordinator's log
and then execute the appropriate commands manually, right?

I think that's right. In the case of the coordinator crash, the user
can look orphaned foreign prepared transactions by checking the
'identifier' column of pg_foreign_xacts on the new standby server and
the prepared transactions on the remote servers.

I think there is a case we can't check orphaned foreign
prepared transaction in pg_foreign_xacts view on the new standby server.
It confuses users and database administrators.

If the primary coordinator crashes after preparing foreign transaction,
but before sending XLOG_FDWXACT_INSERT records to the standby server,
the standby server can't restore their transaction status and
pg_foreign_xacts view doesn't show the prepared foreign transactions.

To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.

To prevent this
issue, I think we would need to send each WAL records in sync but I'm
not sure it's reasonable behavior, and as long as we write WAL in the
local and then send it to replicas we would need a smart mechanism to
prevent this situation.

I agree. To send each 2PC WAL records in sync must be with a large
performance impact.
At least, we need to document the limitation and how to handle this
situation.

Ok. I'll add it.

Thanks a lot.

Related to the pointing out by Ikeda-san, I realized that with the
current patch the backend waits for synchronous replication and then
waits for foreign transaction resolution. But it should be reversed.
Otherwise, it could lead to data loss even when the client got an
acknowledgment of commit. Also, when the user is using both atomic
commit and synchronous replication and wants to cancel waiting, he/she
will need to press ctl-c twice with the current patch, which also
should be fixed.

I'm sorry that I can't understood.

In my understanding, if COMMIT WAL is replicated to the standby in
sync,
the standby server can resolve the transaction after crash recovery in
promoted phase.

If reversed, there are some situation which can't guarantee atomic
commit.
In case that some foreign transaction resolutions are succeed but
others
are failed(and COMMIT WAL is not replicated),
the standby must ABORT PREPARED because the COMMIT WAL is not
replicated.
This means that some foreign transactions are COMMITE PREPARED
executed
by primary coordinator,
other foreign transactions can be ABORT PREPARED executed by secondary
coordinator.

You're right. Thank you for pointing out!

If the coordinator crashes after the client gets acknowledgment of the
successful commit of the transaction but before sending
XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are
left on the replicas even after failover. But since we require FDW to
tolerate the error of undefined prepared transactions in
COMMIT/ROLLBACK PREPARED it won’t be a critical problem.

I agree. It's ok that the primary coordinator sends
XLOG_FDWXACT_REMOVE records asynchronously.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#105Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#96)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jul 17, 2020 at 8:38 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 16 Jul 2020 at 13:53, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Hi Sawada san,

I'm reviewing this patch series, and let me give some initial comments and questions. I'm looking at this with a hope that this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be used for Postgres scale-out.

Thank you for reviewing this patch!

Yes, this patch is trying to resolve the generic atomic commit problem
w.r.t. FDW, and will be useful also for Postgres scale-out.

I think it is important to get a consensus on this point. If I
understand correctly, Tsunakawa-San doesn't sound to be convinced that
FDW can be used for postgres scale-out and we are trying to paint this
feature as a step forward in the scale-out direction. As per my
understanding, we don't have a very clear vision whether we will be
able to achieve the other important aspects of scale-out feature like
global visibility if we go in this direction and that is the reason I
have insisted in this and the other related thread [1]/messages/by-id/07b2c899-4ed0-4c87-1327-23c750311248@postgrespro.ru to at least
have a high-level idea of the same before going too far with this
patch. It is quite possible that after spending months of efforts to
straighten out this patch/feature, we came to the conclusion that this
need to be re-designed or requires a lot of re-work to ensure that it
can be extended for global visibility. It is better to spend some
effort up front to see if the proposed patch is a stepping stone for
achieving what we want w.r.t postgres scale-out.

[1]: /messages/by-id/07b2c899-4ed0-4c87-1327-23c750311248@postgrespro.ru

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#106tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#101)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

I also believe they do. But I'm concerned that some FDW needs to start
a transaction differently when using 2PC. For instance, IIUC MySQL
also supports 2PC but the transaction needs to be started with "XA
START id” when the transaction needs to be prepared. The transaction
started with XA START can be closed by XA END followed by XA PREPARE
or XA COMMIT ONE PHASE. It means that when starts a new transaction
the transaction needs to prepare the transaction identifier and to
know that 2PC might be used. It’s quite different from PostgreSQL. In
PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
required when PREPARE TRANSACTION.

I guess Postgres is rather a minority in this regard. All I know is XA and its Java counterpart (Java Transaction API: JTA). In XA, the connection needs to be associated with an XID before its transaction work is performed.
If some transaction work is already done before associating with XID, xa_start() returns an error like this:

[XA specification]
--------------------------------------------------
[XAER_OUTSIDE]
The resource manager is doing work outside any global transaction on behalf of
the application.
--------------------------------------------------

[Java Transaction API (JTA)]
--------------------------------------------------
void start(Xid xid, int flags) throws XAException

This method starts work on behalf of a transaction branch.
...

3.4.7 Local and Global Transactions
The resource adapter is encouraged to support the usage of both local and global
transactions within the same transactional connection. Local transactions are
transactions that are started and coordinated by the resource manager internally. The
XAResource interface is not used for local transactions.

When using the same connection to perform both local and global transactions, the
following rules apply:

. The local transaction must be committed (or rolled back) before starting a
global transaction in the connection.
. The global transaction must be disassociated from the connection before any
local transaction is started.
--------------------------------------------------

(FWIW, jdbc_fdw would expect to use JTA for this FDW 2PC?)

I haven’t tested the above yet and it’s just a desk plan. it's
definitely a good idea to try integrating this 2PC feature to FDWs
other than postgres_fdw to see if design and interfaces are
implemented sophisticatedly.

Yes, if we address this 2PC feature as an FDW enhancement, we need to make sure that at least some well-known DBMSs should be able to implement the new interface. The following part may help devise the interface:

[References from XA specification]
--------------------------------------------------
The primary use of xa_start() is to register a new transaction branch with the RM.
This marks the start of the branch. Subsequently, the AP, using the same thread of
control, uses the RM’s native interface to do useful work. All requests for service
made by the same thread are part of the same branch until the thread dissociates
from the branch (see below).

3.3.1 Registration of Resource Managers
Normally, a TM involves all associated RMs in a transaction branch. (The TM’s set of
RM switches, described in Section 4.3 on page 21 tells the TM which RMs are
associated with it.) The TM calls all these RMs with xa_start(), xa_end(), and
xa_prepare (), although an RM that is not active in a branch need not participate further
(see Section 2.3.2 on page 8). A technique to reduce overhead for infrequently-used
RMs is discussed below.

Dynamic Registration

Certain RMs, especially those involved in relatively few global transactions, may ask
the TM to assume they are not involved in a transaction. These RMs must register with
the TM before they do application work, to see whether the work is part of a global
transaction. The TM never calls these RMs with any form of xa_start(). An RM
declares dynamic registration in its switch (see Section 4.3 on page 21). An RM can
make this declaration only on its own behalf, and doing so does not change the TM’s
behaviour with respect to other RMs.

When an AP requests work from such an RM, before doing any work, the RM contacts
the TM by calling ax_reg(). The RM must call ax_reg() from the same thread of control
that the AP would use if it called ax_reg() directly. The TM returns to the RM the
appropriate XID if the AP is in a global transaction.

The implications of dynamically registering are as follows: when a thread of control
begins working on behalf of a transaction branch, the transaction manager calls
xa_start() for all resource managers known to the thread except those having
TMREGISTER set in their xa_switch_t structure. Thus, those resource managers with
this flag set must explicitly join a branch with ax_reg(). Secondly, when a thread of
control is working on behalf of a branch, a transaction manager calls xa_end() for all
resource managers known to the thread that either do not have TMREGISTER set in
their xa_switch_t structure or have dynamically registered with ax_reg().

int
xa_start(XID *xid, int rmid, long flags)

DESCRIPTION
A transaction manager calls xa_start() to inform a resource manager that an application
may do work on behalf of a transaction branch.
...
A transaction manager calls xa_start() only for those resource managers that do not
have TMREGISTER set in the flags element of their xa_switch_t structure. Resource
managers with TMREGISTER set must use ax_reg() to join a transaction branch (see
ax_reg() for details).
--------------------------------------------------

To track the foreign transaction status, we can add GetTransactionStatus() to

the FDW interface as an alternative, can't we?

I haven't thought such an interface but it sounds like the transaction
status is managed on both the core and FDWs. Could you elaborate on
that?

I don't have such deep analysis. I just thought that the core could keep track of the local transaction status, and ask each participant FDW about its transaction status to determine an action.

If the WAL records of multiple concurrent transactions are written and

synced separately, i.e. group commit doesn't take effect, then the OLTP
transaction performance will be unacceptable.

I agree that it'll be a large performance penalty. I'd like to have it
but I’m not sure we should have it in the first version from the
perspective of complexity.

I think at least we should have a rough image of how we can reach the goal. Otherwise, the current design/implementation may have to be overhauled with great efforts in the near future. Apart from that, I feel it's unnatural that the commit processing is serialized at the transaction resolver while the DML processing of multiple foreign transactions can be performed in parallel.

Since the procedure of 2PC is originally
high cost, in my opinion, the user should not use as much as possible
in terms of performance. Especially in OLTP, its cost will directly
affect the latency. I’d suggest designing database schema so
transaction touches only one foreign server but do you have concrete
OLTP usecase where normally requires 2PC, and how many servers
involved within a distributed transaction?

I can't share the details, but some of our customers show interest in Postgres scale-out or FDW 2PC for the following use cases:

* Multitenant OLTP where the data specific to one tenant is stored on one database server. On the other hand, some data are shared among all tenants, and they are stored on a separate server. The shared data and the tenant-specific data is updated in the same transaction (I don't know the frequency of such transactions.)

* An IoT use case where each edge database server monitors and tracks the movement of objects in one area. Those edge database servers store the records of objects they manage. When an object gets out of one area and moves to another, the record for the object is moved between the two edge database servers using an atomic distributed transaction.

(I wonder if TPC-C or TPC-E needs distributed transaction...)

Regards
Takayuki Tsunakawa

#107Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Fujii Masao (#102)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sat, 18 Jul 2020 at 01:45, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/17 20:04, Masahiko Sawada wrote:

On Fri, 17 Jul 2020 at 14:22, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
I have briefly checked the only oracle_fdw but in general I think that

if an existing FDW supports transaction begin, commit, and rollback,
these can be ported to new FDW transaction APIs easily.

Does oracle_fdw support begin, commit and rollback?

And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples.

I also believe they do. But I'm concerned that some FDW needs to start
a transaction differently when using 2PC. For instance, IIUC MySQL
also supports 2PC but the transaction needs to be started with "XA
START id” when the transaction needs to be prepared. The transaction
started with XA START can be closed by XA END followed by XA PREPARE
or XA COMMIT ONE PHASE.

This means that FDW should provide also the API for xa_end()?
Maybe we need to consider again which API we should provide in FDW,
based on XA specification?

Not sure that we really need the API for xa_end(). It's not necessary
at least in MySQL case. mysql_fdw can execute either XA END and XA
PREPARE when FDW prepare API is called or XA END and XA COMMIT ONE
PHASE when FDW commit API is called with FDWXACT_FLAG_ONEPHASE.

It means that when starts a new transaction
the transaction needs to prepare the transaction identifier and to
know that 2PC might be used. It’s quite different from PostgreSQL. In
PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
required when PREPARE TRANSACTION.

With MySQL, I guess FDW needs a way to tell the (next) transaction
needs to be started with XA START so it can be prepared. It could be a
custom GUC or an SQL function. Then when starts a new transaction on
MySQL server, FDW can generate and store a transaction identifier into
somewhere alongside the connection. At the prepare phase, it passes
the transaction identifier via GetPrepareId() API to the core.

I haven’t tested the above yet and it’s just a desk plan. it's
definitely a good idea to try integrating this 2PC feature to FDWs
other than postgres_fdw to see if design and interfaces are
implemented sophisticatedly.

With the current patch, we track whether write queries are executed
in each server. Then, if the number of servers that execute write queries
is less than two, 2PC is skipped. This "optimization" is not necessary
(cannot be applied) when using mysql_fdw because the transaction starts
with XA START. Right?

I think we can use XA COMMIT ONE PHASE in MySQL, which both prepares
and commits the transaction. If the number of servers that executed
write queries is less than two, the core transaction manager calls
CommitForeignTransaction API with the flag FDWXACT_FLAG_ONEPHASE. That
way, mysql_fdw can execute XA COMMIT ONE PHASE instead of XA PREPARE,
following XA END. On the other hand, when the number of such servers
is greater than or equals to two, the core transaction manager calls
PrepareForeignTransaction API and then CommitForeignTransactionAPI
without that flag. In this case, mysql_fdw can execute XA END and XA
PREPARE in PrepareForeignTransaction API call, and then XA COMMIT in
CommitForeignTransaction API call.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#108Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Fujii Masao (#103)
5 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From b4ee466552c7496e76b5e55f4c7e958e1dd5d760 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:16:02 +0900
Subject: [PATCH v24 2/6] Support atomic commit among multiple foreign servers.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  109 +
 src/backend/access/fdwxact/fdwxact.c          | 2750 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  558 ++++
 src/backend/access/fdwxact/resolver.c         |  453 +++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   78 +-
 src/backend/access/transam/xact.c             |   54 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/executor/execPartition.c          |    1 +
 src/backend/executor/nodeForeignscan.c        |    2 +
 src/backend/executor/nodeModifyTable.c        |    2 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/replication/syncrep.c             |   15 +-
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   46 +
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   79 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/fdwxactdesc.c              |    1 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  170 +
 src/include/access/fdwxact_launcher.h         |   28 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   63 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   22 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    6 +
 src/include/replication/syncrep.h             |    2 +-
 src/include/storage/proc.h                    |   12 +
 src/include/storage/procarray.h               |    5 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |    7 +
 56 files changed, 4832 insertions(+), 31 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 120000 src/bin/pg_waldump/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..462f42180a
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,109 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+FDW needs to register foreign transaction to the list FdwXactParticipants until
+commit by calling FdwXactRegisterXact(), which is maintained by PostgreSQL's
+the global transaction manager (GTM), as a distributed transaction participant.
+The registered foreign transactions are tracked until the end of transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+We record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE each foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for PREPARE, COMMIT or ROLLBACK
+the transaction on the foreign server, respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+needs to tolerate ERRCODE_UNDEFINED_OBJECT error during committing or rolling
+back a foreign transaction, because there is a race condition that the
+coordinator could crash in time between the resolution is completed and writing
+the WAL removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions will have an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before the foreign
+transaction is committed and aborted by FDW callback functions respectively.
+FdwXact entry is removed once the foreign transaction is resolved with WAL
+logging.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*1). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                     PREPARING                      |----+
+ +----------------------------------------------------+    |
+                          |                                |
+                          v                                |
+ +----------------------------------------------------+    |
+ |                    PREPARED(*1)                    |    | (*2)
+ +----------------------------------------------------+    |
+           |                               |               |
+           v                               v               |
+ +--------------------+          +--------------------+    |
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |<---+
+ +--------------------+          +--------------------+
+
+(*1) Recovered FdwXact entries starts with PREPARED
+(*2) Paths when an error occurrs during preparing
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..6657951668
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2750 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * Two-phase commit protocol is used when the transaction modified two or
+ * more servers including the local node.  If two-phase commit protocol
+ * is not required all foreign transactions are committed at pre-commit
+ * phase.
+ *
+ * FDW needs to register the foreign transaction by FdwXactRegisterXact()
+ * to participate it to a group for global commit.  The registered foreign
+ * transactions are identified by OIDs of server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * all foreign servers.  And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask to commit, we also tell to notify us when
+ * it's done, so that we can wait interruptibly to finish, and so that
+ * we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request.  We have one queue
+ * of which elements are ordered by the timestamp when they expect to be
+ * processed.  Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp when they expects to be processed.
+ * On failure, it enqueues again with new timestamp (last timestamp +
+ * foreign_xact_resolution_interval).
+ *
+ * If server crash occurs or user canceled waiting the prepared foreign
+ * transactions are left without a holder.  Such foreign transactions are
+ * resolved automatically by the resolver process.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.  To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *   status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set locking_backend
+ *   of the FdwXact entry to lock the entry, which prevents the entry from
+ *   being updated and removed by concurrent processes.
+ * * FdwXact entries whose local transaction is either being processed
+ *   (fdwxact->owner is not NULL) or prepared (TwoPhaseExists() is true) can be
+ *   processed by neither pg_resolve_foreign_xact(), pg_remove_foreign_xact() nor
+ *   automatic resolution.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk.  FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon.  We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallack(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define SeverSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.  This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants may not support transaction callbacks: commit, rollback and
+ * prepare.  If a member of participants doesn't support any transaction
+ * callbacks, i.g. ServerSupportTransactionCallack() returns false,
+ * we don't end its transaction.
+ *
+ * FdwXactParticipants_tmp is used to update FdwXactParticipants atomically
+ * when executing COMMIT/ROLLBACK PREPARED command.  In COMMIT PREPARED case,
+ * we don't want to rollback foreign transactions even if an error occurs,
+ * because the local prepared transaction never turn over rollback in that
+ * case.  However, preparing FdwXactParticipants might be lead an error
+ * because of calling palloc() inside.  So we prepare FdwXactParticipants in
+ * two phase.  In the first phase, PrepareFdwXactParticipants(), we collect
+ * all foreign transactions associated with the local prepared transactions
+ * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
+ * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
+ * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
+ *
+ * FdwXactLocalXid is the local transaction id associated with FdwXactParticipants.
+ */
+static List *FdwXactParticipants = NIL;
+static List *FdwXactParticipants_tmp = NIL;
+static TransactionId FdwXactLocalXid = InvalidTransactionId;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static void FdwXactPrepareForeignTransactions(bool prepare_all);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static TransactionId FdwXactGetTransactionFate(TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction. The foreign transaction identified
+ * by given server id and user id.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in
+	 * TopTransactionContext so that these can live until the end of
+	 * transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	return fdw_part;
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	ListCell   	*lc;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.  We need to include
+	 * the local node to the distributed transaction participant and to regard
+	 * it as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.  The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.  It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * We need to use two-phase commit.  Assign a transaction id to the
+		 * current transaction if not yet. Then prepare foreign transactions on
+		 * foreign servers that support two-phase commit.  Note that we keep
+		 * FdwXactParticipants until the end of the transaction.
+		 */
+		FdwXactLocalXid = xid;
+		if (!TransactionIdIsValid(FdwXactLocalXid))
+			FdwXactLocalXid = GetTopTransactionId();
+
+		FdwXactPrepareForeignTransactions(false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+	else
+	{
+		/*
+		 * Two-phase commit is not required. Commit foreign transactions in
+		 * the participant list.
+		 */
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			Assert(!fdw_part->fdwxact);
+
+			/* Commit the foreign transaction in one-phase */
+			if (ServerSupportTransactionCallack(fdw_part))
+				FdwXactParticipantEndTransaction(fdw_part, true);
+		}
+
+		/* All participants' transactions should be completed at this time */
+		ForgetAllFdwXactParticipants();
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state.xid = FdwXactLocalXid;
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW. If prepare_all is true, we prepare all foreign
+ * transaction regardless of writes having happened on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(bool prepare_all)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(TransactionIdIsValid(FdwXactLocalXid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Skip if the server's FDW doesn't support two-phase commit */
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			continue;
+
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, FdwXactLocalXid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(FdwXactLocalXid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.xid = FdwXactLocalXid;
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->owner = MyProc;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->owner = NULL;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All involved
+	 * servers need to support two-phase commit as we prepare on them regardless of
+	 * modified or not.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/* Set the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/* Prepare transactions on participating foreign servers */
+	FdwXactPrepareForeignTransactions(true);
+
+	/*
+	 * We keep prepared foreign transaction participants to rollback them in case
+	 * of failure.
+	 */
+}
+
+/*
+ * After PREPARE TRANSACTION, we forget all participants.
+ */
+void
+PostPrepare_FdwXact(void)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Collect all foreign transactions associated with the given xid if it's a prepared
+ * transaction.  Return true if COMMIT PREPARED or ROLLBACK PREPARED needs to wait for
+ * all foreign transactions to be resolved.  The collected foreign transactions are
+ * kept in FdwXactParticipants_tmp. The caller must call SetFdwXactParticipants()
+ * later if this function returns true.
+ */
+bool
+PrepareFdwXactParticipants(TransactionId xid)
+{
+	MemoryContext old_ctx;
+
+	Assert(FdwXactParticipants_tmp == NIL);
+
+	if (!TwoPhaseExists(xid))
+		return false;
+
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXactParticipant *fdw_part;
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwRoutine *routine;
+
+		if (!fdwxact->valid || fdwxact->local_xid != xid)
+			continue;
+
+		routine = GetFdwRoutineByServerId(fdwxact->serverid);
+		fdw_part = create_fdwxact_participant(fdwxact->serverid, fdwxact->userid,
+											  routine);
+		fdw_part->modified = true;
+		fdw_part->fdwxact = fdwxact;
+
+		/* Add to the participants list */
+		FdwXactParticipants_tmp = lappend(FdwXactParticipants_tmp, fdw_part);
+	}
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_ctx);
+
+	/*
+	 * We cannot proceed to commit this prepared transaction when
+	 * foreign_twophase_commit is disabled.
+	 */
+	if (FdwXactParticipants_tmp != NIL &&
+		!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a prepared foreign transaction commit when foreign_twophase_commit is \'disabled\'")));
+
+	/* Return true if we collect at least one foreign transaction */
+	return (FdwXactParticipants_tmp != NIL);
+}
+
+/*
+ * Set the collected foreign transactions to the participants of this transaction,
+ * and hold them.  This function must be called after CollectFdwXactParticipants().
+ */
+void
+SetFdwXactParticipants(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants_tmp != NIL);
+	Assert(FdwXactParticipants == NIL);
+
+	FdwXactLocalXid = xid;
+	FdwXactParticipants = FdwXactParticipants_tmp;
+	FdwXactParticipants_tmp = NIL;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(SeverSupportTwophaseCommit(fdw_part));
+		Assert(fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED);
+		Assert(fdw_part->fdwxact->locking_backend == InvalidBackendId);
+		Assert(!fdw_part->fdwxact->owner);
+
+		/* Hold the fdwxact entry and set the status */
+		fdw_part->fdwxact->locking_backend = MyBackendId;
+		fdw_part->fdwxact->owner = MyProc;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for its all foreign transactions to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitForResolution(TransactionId wait_xid, bool commit)
+{
+	ListCell	*lc;
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(wait_xid == FdwXactLocalXid);
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/*
+	 * Quick exit if either atomic commit is not requested or we don't have
+	 * any participants.
+	 */
+	if (!IsForeignTwophaseCommitRequested() || FdwXactParticipants == NIL)
+		return;
+
+	/* Set foreign transaction status */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->fdwxact)
+			continue;
+
+		Assert(fdw_part->fdwxact->locking_backend == MyBackendId);
+		Assert(fdw_part->fdwxact->owner == MyProc);
+
+		SpinLockAcquire(&(fdw_part->fdwxact->mutex));
+		fdw_part->fdwxact->status = commit
+			? FDWXACT_STATUS_COMMITTING
+			: FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&(fdw_part->fdwxact->mutex));
+	}
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once resolver changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+		{
+			ForgetAllFdwXactParticipants();
+			break;
+		}
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the resolver processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+				 TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+	bool		found = false;
+
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	/* Initialize variables */
+	*nextResolutionTs_p = -1;
+	*waitXid_p = InvalidTransactionId;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+		{
+			if (proc->fdwXactNextResolutionTs <= now)
+			{
+				/* Found a waiting process */
+				found = true;
+				*waitXid_p = proc->fdwXactWaitXid;
+			}
+			else
+				/* Found a waiting process supposed to be processed later */
+				*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+
+			break;
+		}
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return found ? proc : NULL;
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * In abort case, this function ends foreign transaction participants and possibly
+ * rollback their prepared foreign trasnactions.
+ */
+extern void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		bool need_wait = false;
+
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+			FdwXact		fdwxact = fdw_part->fdwxact;
+			int			status;
+
+			if (!fdwxact)
+			{
+				/*
+				 * Rollback the foreign transaction if its foreign server
+				 * supports transaction callbacks.
+				 */
+				if (ServerSupportTransactionCallack(fdw_part))
+					FdwXactParticipantEndTransaction(fdw_part, false);
+
+				continue;
+			}
+
+			/*
+			 * Abort the foreign transaction.  For participants whose status
+			 * is FDWXACT_STATUS_PREPARING, we close the transaction in
+			 * one-phase. In addition, since we are not sure that the
+			 * preparation has been completed on the foreign server, we also
+			 * attempts to rollback the prepared foreign transaction.  Note
+			 * that it's FDWs responsibility that they tolerate OBJECT_NOT_FOUND
+			 * error in abort case.
+			 */
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+
+			need_wait = true;
+		}
+
+		/*
+		 * Wait for all prepared or possibly-prepared foreign transactions
+		 * to be resolved.
+		 */
+		if (need_wait)
+		{
+			Assert(TransactionIdIsValid(FdwXactLocalXid));
+			FdwXactWaitForResolution(FdwXactLocalXid, false);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
+ */
+void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell   *cell;
+	int			nlefts = 0;
+
+	if (FdwXactParticipants == NIL)
+	{
+		Assert(FdwXactParticipants_tmp == NIL);
+		Assert(!ForeignTwophaseCommitIsRequired);
+		return;
+	}
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/*
+		 * Unlock the foreign transaction entries.  Note that there is a race
+		 * condition; the FdwXact entries in FdwXactParticipants could be used
+		 * by other backend before we forget in case where the resolver process
+		 * removes the FdwXact entry and other backend reuses it before we
+		 * forget.  So we need to check if the entries are still associated with
+		 * the transaction.  We cannnot use locking_backend to check because the
+		 * entry might be already held by the resolver process.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->valid && fdwxact->local_xid == FdwXactLocalXid)
+		{
+			if (fdwxact->locking_backend == MyBackendId)
+				fdwxact->locking_backend = InvalidBackendId;
+
+			fdwxact->owner = NULL;
+			nlefts++;
+		}
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+	FdwXactParticipants_tmp = NIL;
+	FdwXactLocalXid = InvalidTransactionId;
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Resolve foreign transactions at the give indexes. If 'waiter' is not NULL,
+ * we release the waiter after we resolved all of the given foreign transactions
+ * Also on failure, we re-enqueue the waiting backend after incremented the next
+ * resolution time.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		PG_TRY();
+		{
+			FdwXactResolveOneFdwXact(fdwxact);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			if (waiter)
+			{
+				LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+				if (waiter->fdwXactState == FDWXACT_WAITING)
+				{
+					SHMQueueDelete(&(waiter->fdwXactLinks));
+					pg_write_barrier();
+					waiter->fdwXactNextResolutionTs =
+						TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+													foreign_xact_resolution_retry_interval);
+					FdwXactQueueInsert(waiter);
+				}
+				LWLockRelease(FdwXactResolutionLock);
+			}
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	if (!waiter)
+		return;
+
+	/*
+	 * We have resolved all foreign transactions.  Remove waiter from shmem queue,
+	 * if not detached yet. The waiter could already be detached if user cancelled
+	 * to wait before resolution.
+	 */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx != -1);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.  Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare resolution state to pass to API */
+	state.xid = fdwxact->local_xid;
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.  The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that prepared
+	 * this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextFullXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->owner = NULL;
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "prepared (commit)";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "prepared (abort)";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = BoolGetDatum(fdwxact->owner == NULL);
+		values[5] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId || fdwxact->owner)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction entry whose local transaction is prepared"),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	FdwXactCtl->fdwxacts[idx]->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1, NULL);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId || fdwxact->owner)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..a1a41404c7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,558 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * A resolver process resolves the foreign transactions that are
+		 * waiting for resolution or are not being processed by anyone.
+		 * But we don't need to launch a resolver for foreign transactions
+		 * whose local transaction is prepared.
+		 */
+		if ((!fdwxact->owner && !TwoPhaseExists(fdwxact->local_xid)) ||
+			(fdwxact->owner && fdwxact->owner->fdwXactState == FDWXACT_WAITING))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..d34237a329
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,453 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_fdwxacts(PGPROC *waiter);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. We clear that flag from those entries on failure.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* true during processing online foreign transactions */
+static bool processing_online = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If the resolver exits during processing online transactions,
+	 * there might be other waiting online transactions. So request to
+	 * re-launch.
+	 */
+	if (processing_online)
+		FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or the queue has
+		 * only waiters that have a future resolution timestamp.
+		 *
+		 * Set processing_online so that we can request to relaunch on failure.
+		 */
+		processing_online = true;
+		for (;;)
+		{
+			PGPROC	   *waiter;
+
+			CHECK_FOR_INTERRUPTS();
+
+			LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+
+			/* Get the waiter from the queue */
+			waiter = FdwXactGetWaiter(now, &resolutionTs, &waitXid);
+
+			if (!waiter)
+			{
+				/* Not found, break */
+				LWLockRelease(FdwXactResolutionLock);
+				break;
+			}
+
+			/* Hold the waiter's foreign transactions */
+			hold_fdwxacts(waiter);
+			Assert(nheld > 0);
+
+			LWLockRelease(FdwXactResolutionLock);
+
+			/*
+			 * Resolve the waiter's foreign transactions and release the
+			 * waiter.
+			 */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, waiter);
+			CommitTransactionCommand();
+
+			last_resolution_time = now;
+		}
+		processing_online = false;
+
+		/* Hold indoubt foreign transactions */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, NULL);
+			CommitTransactionCommand();
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		/* There is no waiting backend */
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * FdwXactWaiterExists check.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Take foreign transactions whose local transaction is already finished.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* Take entry if not processed by anyone */
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!fdwxact->owner &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* Take over the entry */
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Lock foreign transactions associated with the given waiter's transaction
+ * as in-processing.  The caller must hold FdwXactResolutionLock so that
+ * the waiter doesn't change its state.
+ */
+static void
+hold_fdwxacts(PGPROC *waiter)
+{
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->local_xid == waiter->fdwXactWaitXid)
+		{
+			Assert(fdwxact->owner->fdwXactState == FDWXACT_WAITING);
+			Assert(fdwxact->dbid == waiter->databaseId);
+
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..ea045174e0 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 9b2e59bf0e..45ffa555fb 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+		PGXACT	*pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
+
+		if (pgxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2196,6 +2226,14 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	XLogRecPtr	recptr;
 	TimestampTz committs = GetCurrentTimestamp();
 	bool		replorigin;
+	bool		need_fdwxact_commit;
+	bool		canceled = false;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Are we using the replication origins feature?  Or, in other words, are
@@ -2260,12 +2298,24 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	END_CRIT_SECTION();
 
 	/*
-	 * Wait for synchronous replication, if required.
+	 * Wait for both synchronous replication and foreign transaction
+	 * resolution, if required
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	SyncRepWaitForLSN(recptr, true);
+	canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+	if (need_fdwxact_commit)
+	{
+		/* Collect foreign transaction participants */
+		SetFdwXactParticipants(xid);
+
+		if (!canceled)
+			FdwXactWaitForResolution(xid, true);
+
+		ForgetAllFdwXactParticipants();
+	}
 }
 
 /*
@@ -2285,6 +2335,14 @@ RecordTransactionAbortPrepared(TransactionId xid,
 							   const char *gid)
 {
 	XLogRecPtr	recptr;
+	bool		need_fdwxact_commit;
+	bool		canceled = false;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Catch the scenario where we aborted partway through
@@ -2319,12 +2377,24 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	END_CRIT_SECTION();
 
 	/*
-	 * Wait for synchronous replication, if required.
+	 * Wait for both synchronous replication and foreign transaction
+	 * resolution, if required
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	SyncRepWaitForLSN(recptr, false);
+	canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+	if (need_fdwxact_commit)
+	{
+		/* Collect foreign transaction participants */
+		SetFdwXactParticipants(xid);
+
+		if (!canceled)
+			FdwXactWaitForResolution(xid, false);
+
+		ForgetAllFdwXactParticipants();
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bd4c3cf325..92dbe59980 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1223,6 +1224,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -1231,6 +1233,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1269,12 +1272,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1421,16 +1425,37 @@ RecordTransactionCommit(void)
 	latestXid = TransactionIdLatest(xid, nchildren, children);
 
 	/*
-	 * Wait for synchronous replication, if required. Similar to the decision
-	 * above about using committing asynchronously we only want to wait if
-	 * this backend assigned an xid and wrote WAL.  No need to wait if an xid
-	 * was assigned due to temporary/unlogged tables or due to HOT pruning.
+	 * Wait for both synchronous replication and prepared foreign transaction
+	 * to be committed, if required.  We must wait for synchrnous replication
+	 * first because we need to make sure that the fate of the current
+	 * transaction is consistent between the primary and sync replicas before
+	 * resolving foreign transaction.  Otherwise, we will end up violating
+	 * atomic commit if a fail-over happens after some of foreign transactions
+	 * are committed.
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	if (wrote_xlog && markXidCommitted)
-		SyncRepWaitForLSN(XactLastRecEnd, true);
+	if (markXidCommitted)
+	{
+		bool canceled = false;
+
+		/*
+		 * Similar to the decision above about using committing asynchronously
+		 * we only want to wait if this backend assigned an xid, wrote WAL,
+		 * and not received a query cancel.  No need to wait if an xid was
+		 * assigned due to temporary/unlogged tables or due to HOT pruning.
+		 */
+		if (wrote_xlog)
+			canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+		/*
+		 * We only want to wait if we prepared foreign transactions in this
+		 * transaction and not received query cancel.
+		 */
+		if (!canceled && need_commit_globally)
+			FdwXactWaitForResolution(xid, true);
+	}
 
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
@@ -2091,6 +2116,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2258,6 +2286,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2345,6 +2374,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2536,6 +2568,9 @@ PrepareTransaction(void)
 	 */
 	PostPrepare_Twophase();
 
+	/* Release held FdwXact entries */
+	PostPrepare_FdwXact();
+
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
@@ -2755,6 +2790,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXact(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 184c6672f3..1f123267b5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4602,6 +4603,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6289,6 +6291,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6839,14 +6844,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7048,7 +7054,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7561,6 +7570,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7891,6 +7901,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9187,6 +9200,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9729,8 +9743,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9748,6 +9764,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9766,6 +9783,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9973,6 +9991,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10176,6 +10195,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8625cbeab6..28b5b2f6e8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..12602c02b0 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1078,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1410,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..64b90bda87 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "catalog/partition.h"
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..863d9c77a9 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -22,6 +22,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeForeignscan.h"
 #include "foreign/fdwapi.h"
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..d3235cb502 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -37,6 +37,7 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
@@ -47,6 +48,7 @@
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "rewrite/rewriteHandler.h"
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2258424e81 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 88992c2da2..0f85f166ad 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3663,6 +3663,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3773,6 +3779,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATE:
 			event_name = "HashBatchAllocate";
 			break;
@@ -4102,6 +4111,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index dec02586c7..3652e5f4f5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -973,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 0c0c371739..cb344db320 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index df1e341c76..4c2af941e5 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -143,13 +143,17 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
  * represents a commit record.  If it doesn't, then we wait only for the WAL
  * to be flushed if synchronous_commit is set to the higher level of
  * remote_apply, because only commit records provide apply feedback.
+ *
+ * This function return true if we canceled waiting due to an
+ * interruption.
  */
-void
+bool
 SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 {
 	char	   *new_status = NULL;
 	const char *old_status;
 	int			mode;
+	bool		canceled = false;
 
 	/*
 	 * This should be called while holding interrupts during a transaction
@@ -168,7 +172,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 * Fast exit if user has not requested sync replication.
 	 */
 	if (!SyncRepRequested())
-		return;
+		return false;
 
 	Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
 	Assert(WalSndCtl != NULL);
@@ -188,7 +192,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		lsn <= WalSndCtl->lsn[mode])
 	{
 		LWLockRelease(SyncRepLock);
-		return;
+		return false;
 	}
 
 	/*
@@ -258,6 +262,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -274,6 +279,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					(errmsg("canceling wait for synchronous replication due to user request"),
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -291,6 +297,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		if (rc & WL_POSTMASTER_DEATH)
 		{
 			ProcDiePending = true;
+			canceled = true;
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
 			break;
@@ -316,6 +323,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		set_ps_display(new_status);
 		pfree(new_status);
 	}
+
+	return canceled;
 }
 
 /*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..55609eed81 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b448533564..6fcd58b294 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -94,6 +94,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -249,6 +251,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
@@ -1311,6 +1314,7 @@ GetOldestXmin(Relation rel, int flags)
 
 	TransactionId replication_slot_xmin = InvalidTransactionId;
 	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	TransactionId fdwxact_unresolved_xmin = InvalidTransactionId;
 
 	/*
 	 * If we're not computing a relation specific limit, or if a shared
@@ -1376,6 +1380,7 @@ GetOldestXmin(Relation rel, int flags)
 	 */
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	if (RecoveryInProgress())
 	{
@@ -1425,6 +1430,15 @@ GetOldestXmin(Relation rel, int flags)
 		NormalTransactionIdPrecedes(replication_slot_xmin, result))
 		result = replication_slot_xmin;
 
+	/*
+	 * Check whether there are unresolved distributed transaction
+	 * requiring an older xmin.
+	 */
+	if (!(flags & PROCARRAY_FDWXACT_XMIN) &&
+		TransactionIdIsValid(fdwxact_unresolved_xmin) &&
+		NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result))
+		result = fdwxact_unresolved_xmin;
+
 	/*
 	 * After locks have been released and vacuum_defer_cleanup_age has been
 	 * applied, check whether we need to back up further to make logical
@@ -3125,6 +3139,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
 
 #define XidCacheRemove(i) \
 	do { \
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6985e8eed..241b099238 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,6 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 XactTruncationLock					44
+FdwXactLock							45
+FdwXactResolverLock					46
+FdwXactResolutionLock				47
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e57fcd2538..470d0da3d1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -421,6 +422,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -822,6 +827,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c9424f167c..f6da103fbd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6f603cbbe8..c9db292296 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -430,6 +431,24 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -759,6 +778,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2457,6 +2480,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4597,6 +4666,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5a0b8e9821..60a47e8feb 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -346,6 +348,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 786672b1b6..bc0c12b3b8 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e73639df74..3041c39bc0 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 233441837f..b040202043 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 120000
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..a175eeb6dc
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,170 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/*
+	 * A backend process that executed the distributed transaction. The owner
+	 * and a process locking this entry can be different during transaction
+	 * resolution as the resolver takes over the entry.
+	 */
+	PGPROC		*owner;			/* process that executed the distributed tx. */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	TransactionId xid;
+
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+extern void ForgetAllFdwXactParticipants(void);
+extern void FdwXactReleaseWaiter(PGPROC *waiter);
+extern void FdwXactWaitForResolution(TransactionId wait_xid, bool commit);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter);
+extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+								TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern bool PrepareFdwXactParticipants(TransactionId xid);
+extern void SetFdwXactParticipants(TransactionId xid);
+extern void ClearFdwXactParticipants(void);
+extern void PreCommit_FdwXact(void);
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void AtPrepare_FdwXact(void);
+extern void PostPrepare_FdwXact(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index aef8555367..c2b62a1935 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -102,6 +102,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index b9490a3afe..baa29a7c56 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -239,6 +239,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..9884f5f8e7 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4b5af32440..d33043e417 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5993,6 +5993,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6111,6 +6129,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..83bfc9345b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -806,6 +806,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,6 +855,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_FDWXACT_RESOLUTION,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
 	WAIT_EVENT_HASH_BATCH_LOAD,
@@ -970,6 +973,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 9d286b66c6..cffab9c721 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -82,7 +82,7 @@ extern char *syncrep_parse_error_msg;
 extern char *SyncRepStandbyNames;
 
 /* called by user backend */
-extern void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
+extern bool SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
 
 /* called at backend exit */
 extern void SyncRepCleanupAtProcExit(void);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index b20e2ad4f6..529b07b77b 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -161,6 +162,17 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int				fdwXactState;	/* wait state for foreign transaction
+									 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index a5c7d0c064..0f73b64937 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -36,6 +36,8 @@
 
 #define		PROCARRAY_SLOTS_XMIN			0x20	/* replication slot xmin,
 													 * catalog_xmin */
+#define		PROCARRAY_FDWXACT_XMIN			0x40	/* unresolved distributed
+													   transaciton xmin */
 /*
  * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching
  * PGXACT->vacuumFlags. Other flags are used for different purposes and
@@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 04431d0eb2..a00ca73355 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 601734a6f1..c8f5b18816 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1342,6 +1342,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.23.0

v24-0005-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v24-0005-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 70b3d7290c275152d2c2abd0c882c7614a0a487e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v24 5/6] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 223 ++++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 193 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 137 +++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 495 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 +++++++
 src/test/regress/pg_regress.c                 |  13 +-
 13 files changed, 1321 insertions(+), 5 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 29de73c060..8a48e6ba19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -13,6 +13,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..c6a91ac9f1
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,223 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..8cf860e295
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,193 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..8d48a74e86
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,137 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the case where transaction attempting prepare the local transaction fails after
+# preparing foreign transactions. The first attempt should be succeeded, but the second
+# attempt will fail after preparing foreign transaction, and should rollback the prepared
+# foreign transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
+
+# Inject an panic into prepare phase on srv_2pc_2. The server crashes after preparing both
+# foreign transaction. After the restart, those transactions are recovered as in-doubt
+# transactions. We check if the resolver process rollbacks those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'prepare', 'srv_2pc_2');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
+
+# Inject an panic into commit phase on srv_2pc_1. The server crashes due to the panic
+# error raised by resolver process during commit prepared foreign transaction on srv_2pc_1.
+# After the restart, those transactions are recovered as in-doubt transactions. We check if
+# the resolver process commits those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'commit', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..0dd77c391f
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,495 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dc..25f9ae8c32 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2335,9 +2335,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2352,7 +2355,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v24-0001-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v24-0001-Recreate-RemoveForeignServerById.patchDownload
From c9976410549b0e72fcd794fa355cc7040123d9bb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v24 1/6] Recreate RemoveForeignServerById()

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index f515e2c308..82dbc988a3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1476,6 +1476,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1490,7 +1494,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c26a102b17..89db18b7bc 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.23.0

v24-0003-Documentation-update.patchapplication/octet-stream; name=v24-0003-Documentation-update.patchDownload
From bb2898365a5d68551330a489d0d2cee9e1aceddc Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v24 3/6] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 162 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  91 ++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 836 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a99c681887..ac9c5c9aad 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9237,6 +9237,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -10966,6 +10971,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-doubt status.
+       A foreign transaction can have this status when the user has cancelled
+       the statement or the server crashes during transaction commit.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ca6a3a523f..539992e457 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9098,6 +9098,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..c83f8e9ee9
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,162 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally. The server commits transaction locally.  Any failure happens
+       in this step the server changes to rollback, then rollback all transactions
+       on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    Each commit of a distributed transaction will wait until confirmation is
+    received that all prepared transactions are committed or rolled back. The
+    guarantee we offeris that the application will not receive explicit
+    acknowledgement of the successful commit of a distributed transaction
+    until the all foreign transactions are resolved on the foreign servers.
+   </para>
+
+   <para>
+    When sychronous replication is also used, the distributed transaction
+    will wait for synchronous replication first, and then wait for foreign
+    transaction resolution.  This is necessary because the fate of local
+    transaction commit needs to be consistent among the primary and replicas.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, foreign transactions
+    become <firstterm>in-doubt</firstterm> in two cases:
+
+    <itemizedlist>
+     <listitem>
+      <para>The local node crashed during either preparing or resolving foreign
+       transaction.</para>
+     </listitem>
+     <listitem>
+      <para>user canceled the query.</para>
+     </listitem>
+    </itemizedlist>
+
+    You can check in-doubt transaction in <xref linkend="view-pg-foreign-xacts"/>
+    view. These foreign transactions are resolved by foreign transaction resolver
+    process or executing <function>pg_resolve_foriegn_xact</function> function
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving both foreign transactions that are prepared by
+    online transactions and in-doubt transactions. They commit or rollback
+    prepared transactions on all foreign servers involved with the distributed
+    transaction if the local node received agreement messages from all
+    foreign servers during the first step of two-phase commit protocol.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 74793035d7..13ff3f3575 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1423,6 +1423,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1902,4 +2013,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..65fd76f174 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 959f6a1c2f..599d6b00db 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26176,6 +26176,97 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dc49177c78..030380ff9e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1052,6 +1052,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1277,6 +1289,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1554,6 +1578,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1865,6 +1894,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index c41ce9499b..5ef1f4a329 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -170,6 +170,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v24-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v24-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From ee30ae945587798fd5df09f5fab348e681adb8f7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:28:58 +0500
Subject: [PATCH v24 4/6] postgres_fdw supports atomic commit APIs.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 607 ++++++++++++------
 .../postgres_fdw/expected/postgres_fdw.out    | 280 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  24 +-
 contrib/postgres_fdw/postgres_fdw.h           |   9 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 124 +++-
 doc/src/sgml/postgres-fdw.sgml                |  10 +-
 8 files changed, 808 insertions(+), 256 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 52d1fe3563..e58ce736a6 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -56,6 +57,8 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
+	bool		modified;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -69,17 +72,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -92,6 +91,12 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(ForeignServer *server, UserMapping *userg,
+										  bool will_prep_stmt, bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -104,11 +109,43 @@ static bool UserMappingPasswordRequired(UserMapping *user);
  * (not even on error), we need this flag to cue manual cleanup.
  */
 PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+GetConnection(UserMapping *user, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(GetForeignServer(user->serverid),
+							   user, will_prep_stmt, start_transaction);
+
+	return entry->conn;
+}
+
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (!entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
+/*
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
+ */
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -128,7 +165,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -136,12 +172,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -155,6 +185,22 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(ForeignServer *server, UserMapping *user,
+				   bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -182,14 +228,14 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
-		ForeignServer *server = GetForeignServer(user->serverid);
-
 		/* Reset all transient state fields, to be sure all are clean */
 		entry->xact_depth = 0;
 		entry->have_prep_stmt = false;
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
+		entry->modified = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -200,6 +246,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -207,12 +262,18 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
-	return entry->conn;
+	return entry;
 }
 
 /*
@@ -473,7 +534,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -485,6 +546,8 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		FdwXactRegisterXact(serverid, userid, false);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -700,193 +763,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -903,10 +779,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -917,6 +789,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1251,3 +1127,310 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(frstate->server, frstate->usermapping, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recursively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(frstate->server, frstate->usermapping, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+	entry->modified = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..8c31e26406 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8923,10 +8944,10 @@ RESET ROLE;
 ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD password_required 'false');
 SET ROLE regress_nosuper;
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
+ c1 | c2 |        c3         |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------------------+------------------------------+--------------------------+----+------------+-----
+  1 |  2 | 00001_trig_update | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1          | foo
 (1 row)
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
@@ -8943,16 +8964,16 @@ HINT:  User mappings with the sslcert or sslkey options set may only be created
 DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapping.
 RESET ROLE;
 -- The user mapping for public is passwordless and lacks the password_required=false
 -- mapping option, but will work because the current user is a superuser.
 SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+ c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo
 (1 row)
 
 -- cleanup
@@ -8961,16 +8982,225 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" of relation "T 5" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..75bbb48ebb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,8 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user, false, true);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2752,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3572,9 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user, true, true);
+	MarkConnectionModified(user);
+
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4449,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4535,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4763,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..f922d5795f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,14 +130,19 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
+							 bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +209,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..1ef66123df 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2598,7 +2621,7 @@ ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD passwor
 SET ROLE regress_nosuper;
 
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
 -- first set password_required so we see the right error messages
@@ -2612,7 +2635,7 @@ DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 RESET ROLE;
 
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index eab2cc9378..8783f2077c 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -521,9 +521,13 @@ OPTIONS (ADD password_required 'false');
   </para>
 
   <para>
-   Note that it is currently not supported by
-   <filename>postgres_fdw</filename> to prepare the remote transaction for
-   two-phase commit.
+   <filename>postgrs_fdw</filename> support to prepare the remote transaction
+   for two-phase commit.  Also, if two-phase commit protocol is required to
+   commit the distributed transaction, <filename>postgres_fdw</filename> commits
+   the remote transaction using two-phase commit protocol
+   (see <xref linkend="atomic-commit"/>).  So the remote server needs to set
+   set <xref linkend="guc-max-prepared-transactions"/> more than one so that
+   it can prepare the remote transaction.
   </para>
  </sect2>
 
-- 
2.23.0

#109Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Fujii Masao (#103)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jul 17, 2020 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com>

wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0
ExceptionalCondition + 160
1 postgres 0x000000010cefbc49
ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14
AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a
proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain +
3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup
+ 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain
+ 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

I have done a test with the latest set of patches shared by Swada and I am
not able to reproduce this issue. Started a prepared transaction on the
local server and then did a couple of inserts in a remote table using
postgres_fdw and the quit psql. I am not able to reproduce the assertion
failure.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

#110Muhammad Usama
m.usama@gmail.com
In reply to: Masahiko Sawada (#108)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com>

wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the

review

comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the

HEAD.

But in the patched version, when I did that, I got the following

error.

Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0

ExceptionalCondition + 160

1 postgres 0x000000010cefbc49

ForgetAllFdwXactParticipants + 313

2 postgres 0x000000010cefff14

AtProcExit_FdwXact + 20

3 postgres 0x000000010d313fe3 shmem_exit +

179

4 postgres 0x000000010d313e7a

proc_exit_prepare + 122

5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain

+ 3711

7 postgres 0x000000010d27bb3a BackendRun +

570

8 postgres 0x000000010d27af6b

BackendStartup + 475

9 postgres 0x000000010d279ed1 ServerLoop +

593

10 postgres 0x000000010d277940

PostmasterMain + 6016

11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c,
src/backend/executor/nodeModifyTable.c
and src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.

Thanks
Best regards
Muhammad Usama

Show quoted text

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#111Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Muhammad Usama (#110)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c
and src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.

Thanks for your comment.

Right. I've removed these changes on the local branch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#112Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#111)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/07/27 15:59, Masahiko Sawada wrote:

On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c
and src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.

Thanks for your comment.

Right. I've removed these changes on the local branch.

The latest patches failed to be applied to the master branch. Could you rebase the patches?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#113Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Fujii Masao (#112)
5 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/27 15:59, Masahiko Sawada wrote:

On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c
and src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.

Thanks for your comment.

Right. I've removed these changes on the local branch.

The latest patches failed to be applied to the master branch. Could you rebase the patches?

Thank you for letting me know. I've attached the latest version patch set.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v25-0005-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v25-0005-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From b5ae5220fc1ffadf53a69ea4ae56b1f066330af0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v25 5/5] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 223 ++++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 193 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 137 +++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 495 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 +++++++
 src/test/regress/pg_regress.c                 |  13 +-
 13 files changed, 1321 insertions(+), 5 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index a6d2ffbf9e..106f3b2ff2 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..c6a91ac9f1
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,223 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction when foreign_twophase_commit is 'disabled'
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..8cf860e295
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,193 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup two servers that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_2 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_2 (i int) SERVER srv_2;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_1 and ft_2 don't support two-phase commit.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+
+-- Error. We cannot PREPARE a distributed transaction when
+-- foreign_twophase_commit is disabled.
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..8d48a74e86
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,137 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the case where transaction attempting prepare the local transaction fails after
+# preparing foreign transactions. The first attempt should be succeeded, but the second
+# attempt will fail after preparing foreign transaction, and should rollback the prepared
+# foreign transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
+
+# Inject an panic into prepare phase on srv_2pc_2. The server crashes after preparing both
+# foreign transaction. After the restart, those transactions are recovered as in-doubt
+# transactions. We check if the resolver process rollbacks those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'prepare', 'srv_2pc_2');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/rollback prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
+
+# Inject an panic into commit phase on srv_2pc_1. The server crashes due to the panic
+# error raised by resolver process during commit prepared foreign transaction on srv_2pc_1.
+# After the restart, those transactions are recovered as in-doubt transactions. We check if
+# the resolver process commits those transaction after recovery.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('panic', 'commit', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+$node->restart();
+$node->poll_query_until('postgres',
+						"SELECT count(*) = 0 FROM pg_foreign_xacts")
+  or die "Timeout while waiting for resolver process to resolve in-doubt transactions";
+$log = TestLib::slurp_file($node->logfile);
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_1/, "resolver rolled back in-doubt transaction");
+like($log, qr/commit prepared tx_[0-9]+ on srv_2pc_2/, "resolver rolled back in-doubt transaction");
+truncate $node->logfile, 0;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..0dd77c391f
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,495 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dc..25f9ae8c32 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2335,9 +2335,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2352,7 +2355,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
-- 
2.23.0

v25-0004-postgres_fdw-supports-atomic-commit-APIs.patchapplication/octet-stream; name=v25-0004-postgres_fdw-supports-atomic-commit-APIs.patchDownload
From 9c3da70208f60fd127af0763e61a2be14bdfad23 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:28:58 +0500
Subject: [PATCH v25 4/5] postgres_fdw supports atomic commit APIs.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/Makefile                 |   7 +-
 contrib/postgres_fdw/connection.c             | 607 ++++++++++++------
 .../postgres_fdw/expected/postgres_fdw.out    | 274 +++++++-
 contrib/postgres_fdw/fdwxact.conf             |   3 +
 contrib/postgres_fdw/postgres_fdw.c           |  24 +-
 contrib/postgres_fdw/postgres_fdw.h           |   9 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 124 +++-
 doc/src/sgml/postgres-fdw.sgml                |  10 +-
 8 files changed, 805 insertions(+), 253 deletions(-)
 create mode 100644 contrib/postgres_fdw/fdwxact.conf

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..91fa6e39fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESSCHECK = postgres_fdw
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -29,3 +29,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(pg_regress_check) \
+	    --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \
+	    $(REGRESSCHECK)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 08daf26fdf..1ed8a086e9 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * connection.c
- *		  Connection management functions for postgres_fdw
+ *		  Connection and transaction management functions for postgres_fdw
  *
  * Portions Copyright (c) 2012-2020, PostgreSQL Global Development Group
  *
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
@@ -57,6 +58,8 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		xact_got_connection;
+	bool		modified;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -70,17 +73,13 @@ static HTAB *ConnectionHash = NULL;
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
 
-/* tracks whether any work is needed in callback functions */
-static bool xact_got_connection = false;
-
 /* prototypes of private functions */
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -93,6 +92,12 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionState(ForeignServer *server, UserMapping *userg,
+										  bool will_prep_stmt, bool start_transaction);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -105,11 +110,43 @@ static bool UserMappingPasswordRequired(UserMapping *user);
  * (not even on error), we need this flag to cue manual cleanup.
  */
 PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+GetConnection(UserMapping *user, bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionState(GetForeignServer(user->serverid),
+							   user, will_prep_stmt, start_transaction);
+
+	return entry->conn;
+}
+
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (!entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
+/*
+ * Get connection cache entry. Unlike GetConenctionState function, this function
+ * doesn't establish new connection even if not yet.
+ */
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
+	ConnCacheKey	key;
+	bool			found;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
 
 	/* First time through, initialize connection cache hashtable */
 	if (ConnectionHash == NULL)
@@ -129,7 +166,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		 * Register some callback functions that manage connection cleanup.
 		 * This should be done just once in each backend.
 		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
 		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
 		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
 									  pgfdw_inval_callback, (Datum) 0);
@@ -137,12 +173,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 									  pgfdw_inval_callback, (Datum) 0);
 	}
 
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
 	/*
 	 * Find or create cached entry for requested connection.
 	 */
@@ -156,6 +186,22 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		entry->conn = NULL;
 	}
 
+	return entry;
+}
+
+/*
+ * This function gets the connection cache entry and establishes connection
+ * to the foreign server if there is no connection and starts a new transaction
+ * if 'start_transaction' is true.
+ */
+static ConnCacheEntry *
+GetConnectionState(ForeignServer *server, UserMapping *user,
+				   bool will_prep_stmt, bool start_transaction)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
 
@@ -183,14 +229,14 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 */
 	if (entry->conn == NULL)
 	{
-		ForeignServer *server = GetForeignServer(user->serverid);
-
 		/* Reset all transient state fields, to be sure all are clean */
 		entry->xact_depth = 0;
 		entry->have_prep_stmt = false;
 		entry->have_error = false;
 		entry->changing_xact_state = false;
 		entry->invalidated = false;
+		entry->xact_got_connection = false;
+		entry->modified = false;
 		entry->server_hashvalue =
 			GetSysCacheHashValue1(FOREIGNSERVEROID,
 								  ObjectIdGetDatum(server->serverid));
@@ -201,6 +247,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		/* Now try to make the connection */
 		entry->conn = connect_pg_server(server, user);
 
+		Assert(entry->conn);
+
+		if (!entry->conn)
+		{
+			elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed",
+				 server->servername);
+			return NULL;
+		}
+
 		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
 			 entry->conn, server->servername, user->umid, user->userid);
 	}
@@ -208,12 +263,18 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	if (start_transaction)
+	{
+		begin_remote_xact(entry, user->serverid, user->userid);
+
+		/* Set flag that we did GetConnection during the current transaction */
+		entry->xact_got_connection = true;
+	}
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
 
-	return entry->conn;
+	return entry;
 }
 
 /*
@@ -474,7 +535,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -486,6 +547,8 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		FdwXactRegisterXact(serverid, userid, false);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -701,193 +764,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -904,10 +780,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 		  event == SUBXACT_EVENT_ABORT_SUB))
 		return;
 
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
 	/*
 	 * Scan all connection cache entries to find open remote subtransactions
 	 * of the current level, and close them.
@@ -918,6 +790,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	{
 		char		sql[100];
 
+		/* Quick exit if no connections were touched in this transaction. */
+		if (!entry->xact_got_connection)
+			continue;
+
 		/*
 		 * We only care about connections with open remote subtransactions of
 		 * the current level.
@@ -1252,3 +1128,310 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	/* The transaction should have been started */
+	Assert(entry->xact_got_connection && entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Commit a transaction or a prepared transaction on foreign server. If
+ * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the
+ * foreign transaction without preparation, otherwise commit the prepared
+ * transaction.
+ */
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	PGresult		*res;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(frstate->server, frstate->usermapping, false, false);
+
+		/* COMMIT PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, true);
+		return;
+	}
+
+	/*
+	 * In simple commit case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	if (!entry->conn || !entry->xact_got_connection)
+		return;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/*
+ * Rollback a transaction on foreign server. As with commit case, if state->flags
+ * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign
+ * transaction without preparation, other wise rollback the prepared transaction.
+ * This function must tolerate to being called recursively as an error can happen
+ * during aborting.
+ */
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has prepared and
+		 * closed, so we might not have a connection to it. We get a connection
+		 * but don't start transaction.
+		 */
+		entry = GetConnectionState(frstate->server, frstate->usermapping, false, false);
+
+		/* ROLLBACK PREPARED the transaction */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, false);
+		return;
+	}
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection or starting transaction.
+	 */
+	if (!entry->conn || !entry->xact_got_connection)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+	entry->xact_got_connection = false;
+	entry->modified = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..ccbed18dcc 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -13,12 +13,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
 ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
@@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 (
 	c2 int NOT NULL,
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -191,15 +210,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                              List of foreign tables
- Schema | Table |  Server   |              FDW options              | Description 
---------+-------+-----------+---------------------------------------+-------------
- public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
- public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
- public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
- public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
-(5 rows)
+                               List of foreign tables
+ Schema |  Table  |  Server   |              FDW options              | Description 
+--------+---------+-----------+---------------------------------------+-------------
+ public | ft1     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2     | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4     | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5     | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6     | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft7_2pc | loopback  | (schema_name 'S 1', table_name 'T 5') | 
+ public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | 
+(7 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -8923,10 +8944,10 @@ RESET ROLE;
 ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD password_required 'false');
 SET ROLE regress_nosuper;
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
-  c1  | c2 | c3 | c4 | c5 | c6 |     c7     | c8 
-------+----+----+----+----+----+------------+----
- 1111 |  2 |    |    |    |    | ft1        | 
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
+ c1 | c2 |        c3         |              c4              |            c5            | c6 |     c7     | c8  
+----+----+-------------------+------------------------------+--------------------------+----+------------+-----
+  1 |  2 | 00001_trig_update | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1          | foo
 (1 row)
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
@@ -8943,7 +8964,7 @@ HINT:  User mappings with the sslcert or sslkey options set may only be created
 DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 ERROR:  password is required
 DETAIL:  Non-superusers must provide a password in the user mapping.
 RESET ROLE;
@@ -8961,16 +8982,225 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+-- Modify single foreign server and then commit and rollback.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+(1 row)
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+(3 rows)
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
 BEGIN;
-SELECT count(*) FROM ft1;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ERROR:  duplicate key value violates unique constraint "t6_pkey"
+DETAIL:  Key (c1)=(3) already exists.
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+SELECT * FROM "S 1"."T 6";
+ c1 
+----
+  3
+(1 row)
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ERROR:  null value in column "c1" of relation "T 5" violates not-null constraint
+DETAIL:  Failing row contains (null).
+CONTEXT:  remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1)
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+(4 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+(6 rows)
+
+RELEASE SAVEPOINT S1;
+ERROR:  RELEASE SAVEPOINT can only be used in transaction blocks
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+(8 rows)
+
+SET foreign_twophase_commit TO 'required';
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+ c1 
+----
+  1
+  2
+  2
+  3
+  5
+  5
+  8
+  8
+  9
+  9
+(10 rows)
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
  count 
 -------
-   822
+     0
 (1 row)
 
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
-ROLLBACK;
-WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf
new file mode 100644
index 0000000000..3fdbf93cdb
--- /dev/null
+++ b/contrib/postgres_fdw/fdwxact.conf
@@ -0,0 +1,3 @@
+max_prepared_transactions = 3
+max_prepared_foreign_transactions = 3
+max_foreign_transaction_resolvers = 2
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..75bbb48ebb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -14,6 +14,7 @@
 
 #include <limits.h>
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
 
-
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
  * to my callback routines.
@@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user, false, true);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -2372,7 +2377,8 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user, false, true);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2746,7 +2752,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user, false, true);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3566,7 +3572,9 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user, true, true);
+	MarkConnectionModified(user);
+
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4441,7 +4449,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, true);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4527,7 +4535,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, true);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4755,7 +4763,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping, false, true);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..f922d5795f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -129,14 +130,19 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
+							 bool start_transaction);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
@@ -203,6 +209,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
 									bool is_subquery,
 									List **retrieved_attrs, List **params_list);
 extern const char *get_jointype_name(JoinType jointype);
+extern bool server_uses_twophase_commit(ForeignServer *server);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..1ef66123df 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -15,6 +15,10 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
@@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" (
 	c3 text,
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 5" (
+       c1 int NOT NULL
+);
+
+CREATE TABLE "S 1"."T 6" (
+       c1 int NOT NULL,
+       CONSTRAINT t6_pkey PRIMARY KEY (c1)
+);
 
 -- Disable autovacuum for these tables to avoid unexpected effects of that
 ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
@@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
 ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
 
 -- ===================================================================
 -- create foreign tables
@@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 (
 	c3 text
 ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
 
+CREATE FOREIGN TABLE ft7_2pc (
+       c1 int NOT NULL
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+CREATE FOREIGN TABLE ft8_2pc (
+       c1 int NOT NULL
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5');
+
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -2598,7 +2621,7 @@ ALTER USER MAPPING FOR regress_nosuper SERVER loopback_nopw OPTIONS (ADD passwor
 SET ROLE regress_nosuper;
 
 -- Should finally work now
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 -- unpriv user also cannot set sslcert / sslkey on the user mapping
 -- first set password_required so we see the right error messages
@@ -2612,7 +2635,7 @@ DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 
 -- This will fail again as it'll resolve the user mapping for public, which
 -- lacks password_required=false
-SELECT * FROM ft1_nopw LIMIT 1;
+SELECT * FROM ft1_nopw ORDER BY 1 LIMIT 1;
 
 RESET ROLE;
 
@@ -2628,9 +2651,98 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
+-- ===================================================================
+-- test distributed atomic commit across foreign servers
+-- ===================================================================
+
+-- Enable atomic commit
+SET foreign_twophase_commit TO 'required';
+
+-- Modify single foreign server and then commit and rollback.
 BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
+INSERT INTO ft7_2pc VALUES(1);
+COMMIT;
+SELECT * FROM ft7_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(1);
 ROLLBACK;
+SELECT * FROM ft7_2pc;
+
+-- Modify two servers then commit and rollback. This requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+COMMIT;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(2);
+INSERT INTO ft8_2pc VALUES(2);
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+-- Modify both local data and 2PC-capable server then commit and rollback.
+-- This also requires to use 2PC.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(3);
+INSERT INTO "S 1"."T 6" VALUES (3);
+ROLLBACK;
+SELECT * FROM ft7_2pc;
+SELECT * FROM "S 1"."T 6";
+
+-- Modify foreign server and raise an error. No data changed.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(4);
+INSERT INTO ft8_2pc VALUES(NULL); -- violation
+ROLLBACK;
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES (5);
+INSERT INTO ft8_2pc VALUES (5);
+SAVEPOINT S1;
+INSERT INTO ft7_2pc VALUES (6);
+INSERT INTO ft8_2pc VALUES (6);
+ROLLBACK TO S1;
+COMMIT;
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+RELEASE SAVEPOINT S1;
+
+-- When set to 'disabled', we can commit it
+SET foreign_twophase_commit TO 'disabled';
+BEGIN;
+INSERT INTO ft7_2pc VALUES(8);
+INSERT INTO ft8_2pc VALUES(8);
+COMMIT; -- success
+SELECT * FROM ft7_2pc;
+SELECT * FROM ft8_2pc;
+
+SET foreign_twophase_commit TO 'required';
+
+-- Commit and rollback foreign transactions that are part of
+-- prepare transaction.
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+COMMIT PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+BEGIN;
+INSERT INTO ft7_2pc VALUES(9);
+INSERT INTO ft8_2pc VALUES(9);
+PREPARE TRANSACTION 'gx1';
+ROLLBACK PREPARED 'gx1';
+SELECT * FROM ft8_2pc;
+
+-- No entry remained
+SELECT count(*) FROM pg_foreign_xacts;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index eab2cc9378..8783f2077c 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -521,9 +521,13 @@ OPTIONS (ADD password_required 'false');
   </para>
 
   <para>
-   Note that it is currently not supported by
-   <filename>postgres_fdw</filename> to prepare the remote transaction for
-   two-phase commit.
+   <filename>postgrs_fdw</filename> support to prepare the remote transaction
+   for two-phase commit.  Also, if two-phase commit protocol is required to
+   commit the distributed transaction, <filename>postgres_fdw</filename> commits
+   the remote transaction using two-phase commit protocol
+   (see <xref linkend="atomic-commit"/>).  So the remote server needs to set
+   set <xref linkend="guc-max-prepared-transactions"/> more than one so that
+   it can prepare the remote transaction.
   </para>
  </sect2>
 
-- 
2.23.0

v25-0001-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v25-0001-Recreate-RemoveForeignServerById.patchDownload
From 14e8caeb2045f472f334d6a16a7080b6389b92d0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v25 1/5] Recreate RemoveForeignServerById()

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index f515e2c308..82dbc988a3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1476,6 +1476,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1490,7 +1494,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c26a102b17..89db18b7bc 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.23.0

v25-0003-Documentation-update.patchapplication/octet-stream; name=v25-0003-Documentation-update.patchDownload
From 7e8d5244fa83b2416ea66fd77e230adbc07ca423 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v25 3/5] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 162 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  91 ++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 836 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 1232b24e74..31e89bef87 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9246,6 +9246,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11094,6 +11099,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-doubt status.
+       A foreign transaction can have this status when the user has cancelled
+       the statement or the server crashes during transaction commit.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a7177c550..6717a13159 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9160,6 +9160,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..c83f8e9ee9
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,162 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally. The server commits transaction locally.  Any failure happens
+       in this step the server changes to rollback, then rollback all transactions
+       on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    Each commit of a distributed transaction will wait until confirmation is
+    received that all prepared transactions are committed or rolled back. The
+    guarantee we offeris that the application will not receive explicit
+    acknowledgement of the successful commit of a distributed transaction
+    until the all foreign transactions are resolved on the foreign servers.
+   </para>
+
+   <para>
+    When sychronous replication is also used, the distributed transaction
+    will wait for synchronous replication first, and then wait for foreign
+    transaction resolution.  This is necessary because the fate of local
+    transaction commit needs to be consistent among the primary and replicas.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, foreign transactions
+    become <firstterm>in-doubt</firstterm> in two cases:
+
+    <itemizedlist>
+     <listitem>
+      <para>The local node crashed during either preparing or resolving foreign
+       transaction.</para>
+     </listitem>
+     <listitem>
+      <para>user canceled the query.</para>
+     </listitem>
+    </itemizedlist>
+
+    You can check in-doubt transaction in <xref linkend="view-pg-foreign-xacts"/>
+    view. These foreign transactions are resolved by foreign transaction resolver
+    process or executing <function>pg_resolve_foriegn_xact</function> function
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving both foreign transactions that are prepared by
+    online transactions and in-doubt transactions. They commit or rollback
+    prepared transactions on all foreign servers involved with the distributed
+    transaction if the local node received agreement messages from all
+    foreign servers during the first step of two-phase commit protocol.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 74793035d7..13ff3f3575 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1423,6 +1423,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1902,4 +2013,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..65fd76f174 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9a4ac5a1ea..d8ab4eddb0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26187,6 +26187,97 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 304c49f07b..3745927dbb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1049,6 +1049,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1274,6 +1286,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1551,6 +1575,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1868,6 +1897,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index c41ce9499b..5ef1f4a329 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -170,6 +170,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v25-0002-Support-atomic-commit-among-multiple-foreign-ser.patchapplication/octet-stream; name=v25-0002-Support-atomic-commit-among-multiple-foreign-ser.patchDownload
From fe531faa3151ae1b20760c35a047164e80f13af1 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:16:02 +0900
Subject: [PATCH v25 2/5] Support atomic commit among multiple foreign servers.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile                   |    2 +-
 src/backend/access/fdwxact/Makefile           |   17 +
 src/backend/access/fdwxact/README             |  109 +
 src/backend/access/fdwxact/fdwxact.c          | 2755 +++++++++++++++++
 src/backend/access/fdwxact/launcher.c         |  558 ++++
 src/backend/access/fdwxact/resolver.c         |  454 +++
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   77 +-
 src/backend/access/transam/xact.c             |   54 +-
 src/backend/access/transam/xlog.c             |   34 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   23 +
 src/backend/foreign/foreign.c                 |   55 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/pgstat.c               |   18 +
 src/backend/postmaster/postmaster.c           |   15 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/replication/syncrep.c             |   15 +-
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   57 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    3 +
 src/backend/storage/lmgr/proc.c               |    8 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/misc/guc.c                  |   79 +
 src/backend/utils/misc/postgresql.conf.sample |   16 +
 src/backend/utils/probes.d                    |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |  170 +
 src/include/access/fdwxact_launcher.h         |   28 +
 src/include/access/fdwxact_resolver.h         |   23 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/resolver_internal.h        |   63 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xact.h                     |    7 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   22 +
 src/include/foreign/fdwapi.h                  |   12 +
 src/include/foreign/foreign.h                 |    1 +
 src/include/pgstat.h                          |    6 +
 src/include/replication/syncrep.h             |    2 +-
 src/include/storage/proc.h                    |   12 +
 src/include/storage/procarray.h               |    3 +
 src/include/utils/guc_tables.h                |    2 +
 src/test/regress/expected/rules.out           |    7 +
 52 files changed, 4836 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/README
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact.h
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..49480dd039 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+			  table tablesample transam fdwxact
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..0207a66fb4
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o resolver.o launcher.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README
new file mode 100644
index 0000000000..462f42180a
--- /dev/null
+++ b/src/backend/access/fdwxact/README
@@ -0,0 +1,109 @@
+src/backend/access/fdwxact/README
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a consistent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+---------------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consists
+of the following four steps:
+
+1. Foreign Server Registration
+FDW needs to register foreign transaction to the list FdwXactParticipants until
+commit by calling FdwXactRegisterXact(), which is maintained by PostgreSQL's
+the global transaction manager (GTM), as a distributed transaction participant.
+The registered foreign transactions are tracked until the end of transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+We record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE each foreign transactions.
+Thus in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared transaction on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node. In other case, we can commit them at this
+step by calling CommitForeignTransaction() API and no need further operation.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions need to be resolved
+using pg_resolve_foreign_xact() manually and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step (commit or rollback) is done by the foreign transaction
+resolver process. The backend process inserts itself to the wait queue, and
+then wake up the resolver process (or request to launch new one if necessary).
+The resolver process enqueue the waiter and fetch the distributed transaction
+information that the backend is waiting for. Once all foreign transaction are
+committed or rollbacked the resolver process wake up the waiter.
+
+
+Foreign Data Wrapper Callbacks for Transaction Management
+-----------------------------------------------------------
+
+The core GTM manages the status of individual foreign transactions and calls
+transaction management callback functions according to its status. Each
+callback functions PrepareForeignTransaction, CommitForeignTransaction and
+RollbackForeignTransaction is responsible for PREPARE, COMMIT or ROLLBACK
+the transaction on the foreign server, respectively.
+FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can
+commit or rollback the foreign transaction in one-phase. On failure during
+processing a foreign transaction, FDW needs to raise an error. However, FDW
+needs to tolerate ERRCODE_UNDEFINED_OBJECT error during committing or rolling
+back a foreign transaction, because there is a race condition that the
+coordinator could crash in time between the resolution is completed and writing
+the WAL removing the FdwXact entry.
+
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transactions will have an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared and it changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before the foreign
+transaction is committed and aborted by FDW callback functions respectively.
+FdwXact entry is removed once the foreign transaction is resolved with WAL
+logging.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status is FDWXACT_STATUS_PREPARED(*1). Because the foreign transaction was
+being processed we cannot know the exact status. So we regard it as PREPARED
+for safety.
+
+The foreign transaction status transition is illustrated by the following graph
+describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                     PREPARING                      |----+
+ +----------------------------------------------------+    |
+                          |                                |
+                          v                                |
+ +----------------------------------------------------+    |
+ |                    PREPARED(*1)                    |    | (*2)
+ +----------------------------------------------------+    |
+           |                               |               |
+           v                               v               |
+ +--------------------+          +--------------------+    |
+ |   COMMITTING(*1)   |          |    ABORTING(*1)    |<---+
+ +--------------------+          +--------------------+
+
+(*1) Recovered FdwXact entries starts with PREPARED
+(*2) Paths when an error occurrs during preparing
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..b0fc913d5d
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,2755 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * To achieve commit among all foreign servers atomically, we employee
+ * two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). The basic strategy is that we prepare all of the remote
+ * transactions before committing locally and commit them after committing
+ * locally.
+ *
+ * Two-phase commit protocol is used when the transaction modified two or
+ * more servers including the local node.  If two-phase commit protocol
+ * is not required all foreign transactions are committed at pre-commit
+ * phase.
+ *
+ * FDW needs to register the foreign transaction by FdwXactRegisterXact()
+ * to participate it to a group for global commit.  The registered foreign
+ * transactions are identified by OIDs of server and user.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * all foreign servers.  And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask to commit, we also tell to notify us when
+ * it's done, so that we can wait interruptibly to finish, and so that
+ * we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request.  We have one queue
+ * of which elements are ordered by the timestamp when they expect to be
+ * processed.  Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp when they expects to be processed.
+ * On failure, it enqueues again with new timestamp (last timestamp +
+ * foreign_xact_resolution_interval).
+ *
+ * If server crash occurs or user canceled waiting the prepared foreign
+ * transactions are left without a holder.  Such foreign transactions are
+ * resolved automatically by the resolver process.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.  To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *   status is protected by its mutex.
+ * * A process who is going to process a foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the
+ *   entry from being updated and removed by concurrent processes.
+ * * A process who starts a distributed transaction helds the FdwXact entry
+ *   by setting locking_backend, and set its PGPROC to proc field.
+ * * 'locking_backend' can be overwritten by a foreign transaction resolver
+ *   even when the FdwXact entry is already held by someone.
+ * * On the other hand, 'proc' remains until the end of the transaction.
+ * * Foreign transaction resolvers can resolve foreign transaction whose local
+ *   transaction is not processed (i.g., proc is NULL) and not prepared
+ *   (TwoPhaseExists() is false) and locked by itself.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *   with entries marked with fdwxact->inredo and fdwxact->ondisk.  FdwXact file
+ *   data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *   We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *   have fdwxact->inredo set and are behind the redo_horizon.  We save
+ *   them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *   fdwxact->ondisk is true, the corresponding entry from the disk is
+ *   additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *   fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_xlog.h"
+#include "access/resolver_internal.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "catalog/pg_type.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pg_trace.h"
+#include "pgstat.h"
+#include "storage/fd.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/pmsignal.h"
+#include "storage/shmem.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/ps_status.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallack(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define SeverSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.  This struct
+ * is created at the beginning of execution for each foreign servers and
+ * is used until the end of transaction where we cannot look at syscaches.
+ * Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
+	/* true if modified the data on the server */
+	bool		modified;
+
+	/* Callbacks for foreign transaction */
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants may not support transaction callbacks: commit, rollback and
+ * prepare.  If a member of participants doesn't support any transaction
+ * callbacks, i.g. ServerSupportTransactionCallack() returns false,
+ * we don't end its transaction.
+ *
+ * FdwXactParticipants_tmp is used to update FdwXactParticipants atomically
+ * when executing COMMIT/ROLLBACK PREPARED command.  In COMMIT PREPARED case,
+ * we don't want to rollback foreign transactions even if an error occurs,
+ * because the local prepared transaction never turn over rollback in that
+ * case.  However, preparing FdwXactParticipants might be lead an error
+ * because of calling palloc() inside.  So we prepare FdwXactParticipants in
+ * two phase.  In the first phase, PrepareFdwXactParticipants(), we collect
+ * all foreign transactions associated with the local prepared transactions
+ * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
+ * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
+ * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
+ *
+ * FdwXactLocalXid is the local transaction id associated with FdwXactParticipants.
+ */
+static List *FdwXactParticipants = NIL;
+static List *FdwXactParticipants_tmp = NIL;
+static TransactionId FdwXactLocalXid = InvalidTransactionId;
+
+/*
+ * True is the current transaction needs to be committed together with
+ * foreign servers.
+ */
+static bool ForeignTwophaseCommitIsRequired = false;
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
+
+/* Guc parameters */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, Oid umid, char *fdwxact_id);
+static void FdwXactPrepareForeignTransactions(bool prepare_all);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static void FdwXactCancelWait(void);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool give_warnings);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							 Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool giveWarning);
+static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid,
+								  Oid serverid, Oid userid,
+								  XLogRecPtr insert_start_lsn,
+								  bool from_disk);
+static TransactionId FdwXactGetTransactionFate(TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static void remove_fdwxact(FdwXact fdwxact);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
+
+/*
+ * Register given foreign transaction identified by given arguments as
+ * a participant of the transaction. The foreign transaction identified
+ * by given server id and user id.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* The foreign server is already registered, return */
+			fdw_part->modified |= modified;
+			return;
+		}
+	}
+
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in
+	 * TopTransactionContext so that these can live until the end of
+	 * transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			FdwXactParticipants = foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->fdwxact = NULL;
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
+
+	return fdw_part;
+}
+
+/*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	ListCell   	*lc;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.  We need to include
+	 * the local node to the distributed transaction participant and to regard
+	 * it as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.  The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.  It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * We need to use two-phase commit.  Assign a transaction id to the
+		 * current transaction if not yet. Then prepare foreign transactions on
+		 * foreign servers that support two-phase commit.  Note that we keep
+		 * FdwXactParticipants until the end of the transaction.
+		 */
+		FdwXactLocalXid = xid;
+		if (!TransactionIdIsValid(FdwXactLocalXid))
+			FdwXactLocalXid = GetTopTransactionId();
+
+		FdwXactPrepareForeignTransactions(false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+	else
+	{
+		/*
+		 * Two-phase commit is not required. Commit foreign transactions in
+		 * the participant list.
+		 */
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			Assert(!fdw_part->fdwxact);
+
+			/* Commit the foreign transaction in one-phase */
+			if (ServerSupportTransactionCallack(fdw_part))
+				FdwXactParticipantEndTransaction(fdw_part, true);
+		}
+
+		/* All participants' transactions should be completed at this time */
+		ForgetAllFdwXactParticipants();
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(fdw_part->commit_foreign_xact_fn);
+	Assert(fdw_part->rollback_foreign_xact_fn);
+
+	state.xid = FdwXactLocalXid;
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions. Before inserting
+ * FdwXact entry we call get_preparedid callback to get a transaction
+ * identifier from FDW. If prepare_all is true, we prepare all foreign
+ * transaction regardless of writes having happened on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(bool prepare_all)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(TransactionIdIsValid(FdwXactLocalXid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Skip if the server's FDW doesn't support two-phase commit */
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			continue;
+
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, FdwXactLocalXid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(FdwXactLocalXid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.xid = FdwXactLocalXid;
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = pstrdup(fdw_part->fdwxact_id);
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->proc = MyProc;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->proc = NULL;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier. If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char	   *id;
+	int			id_len = 0;
+
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
+
+	id[id_len] = '\0';
+	return pstrdup(id);
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Check for an invalid condition */
+	if (!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'")));
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All involved
+	 * servers need to support two-phase commit as we prepare on them regardless of
+	 * modified or not.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!SeverSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/* Set the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
+	/* Prepare transactions on participating foreign servers */
+	FdwXactPrepareForeignTransactions(true);
+
+	/*
+	 * We keep prepared foreign transaction participants to rollback them in case
+	 * of failure.
+	 */
+}
+
+/*
+ * After PREPARE TRANSACTION, we forget all participants.
+ */
+void
+PostPrepare_FdwXact(void)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Collect all foreign transactions associated with the given xid if it's a prepared
+ * transaction.  Return true if COMMIT PREPARED or ROLLBACK PREPARED needs to wait for
+ * all foreign transactions to be resolved.  The collected foreign transactions are
+ * kept in FdwXactParticipants_tmp. The caller must call SetFdwXactParticipants()
+ * later if this function returns true.
+ */
+bool
+PrepareFdwXactParticipants(TransactionId xid)
+{
+	MemoryContext old_ctx;
+
+	Assert(FdwXactParticipants_tmp == NIL);
+
+	if (!TwoPhaseExists(xid))
+		return false;
+
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXactParticipant *fdw_part;
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwRoutine *routine;
+
+		if (!fdwxact->valid || fdwxact->local_xid != xid)
+			continue;
+
+		routine = GetFdwRoutineByServerId(fdwxact->serverid);
+		fdw_part = create_fdwxact_participant(fdwxact->serverid, fdwxact->userid,
+											  routine);
+		fdw_part->modified = true;
+		fdw_part->fdwxact = fdwxact;
+
+		/* Add to the participants list */
+		FdwXactParticipants_tmp = lappend(FdwXactParticipants_tmp, fdw_part);
+	}
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_ctx);
+
+	/*
+	 * We cannot proceed to commit this prepared transaction when
+	 * foreign_twophase_commit is disabled.
+	 */
+	if (FdwXactParticipants_tmp != NIL &&
+		!IsForeignTwophaseCommitRequested())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a prepared foreign transaction commit when foreign_twophase_commit is \'disabled\'")));
+
+	/* Return true if we collect at least one foreign transaction */
+	return (FdwXactParticipants_tmp != NIL);
+}
+
+/*
+ * Set the collected foreign transactions to the participants of this transaction,
+ * and hold them.  This function must be called after CollectFdwXactParticipants().
+ */
+void
+SetFdwXactParticipants(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants_tmp != NIL);
+	Assert(FdwXactParticipants == NIL);
+
+	FdwXactLocalXid = xid;
+	FdwXactParticipants = FdwXactParticipants_tmp;
+	FdwXactParticipants_tmp = NIL;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(SeverSupportTwophaseCommit(fdw_part));
+		Assert(fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED);
+		Assert(fdw_part->fdwxact->locking_backend == InvalidBackendId);
+		Assert(!fdw_part->fdwxact->proc);
+
+		/* Hold the fdwxact entry and set the status */
+		fdw_part->fdwxact->locking_backend = MyBackendId;
+		fdw_part->fdwxact->proc = MyProc;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+void
+FdwXactCleanupAtProcExit(void)
+{
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Wait for its all foreign transactions to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitForResolution(TransactionId wait_xid, bool commit)
+{
+	ListCell	*lc;
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(wait_xid == FdwXactLocalXid);
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/*
+	 * Quick exit if either atomic commit is not requested or we don't have
+	 * any participants.
+	 */
+	if (!IsForeignTwophaseCommitRequested() || FdwXactParticipants == NIL)
+		return;
+
+	/* Set foreign transaction status */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->fdwxact)
+			continue;
+
+		Assert(fdw_part->fdwxact->locking_backend == MyBackendId);
+		Assert(fdw_part->fdwxact->proc == MyProc);
+
+		SpinLockAcquire(&(fdw_part->fdwxact->mutex));
+		fdw_part->fdwxact->status = commit
+			? FDWXACT_STATUS_COMMITTING
+			: FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&(fdw_part->fdwxact->mutex));
+	}
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once resolver changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+		{
+			ForgetAllFdwXactParticipants();
+			break;
+		}
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.  The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the resolver processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+				 TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+	bool		found = false;
+
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	/* Initialize variables */
+	*nextResolutionTs_p = -1;
+	*waitXid_p = InvalidTransactionId;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+		{
+			if (proc->fdwXactNextResolutionTs <= now)
+			{
+				/* Found a waiting process */
+				found = true;
+				*waitXid_p = proc->fdwXactWaitXid;
+			}
+			else
+				/* Found a waiting process supposed to be processed later */
+				*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+
+			break;
+		}
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return found ? proc : NULL;
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * In abort case, this function ends foreign transaction participants and possibly
+ * rollback their prepared foreign trasnactions.
+ */
+extern void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	if (!is_commit)
+	{
+		bool need_wait = false;
+
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = lfirst(lc);
+			FdwXact		fdwxact = fdw_part->fdwxact;
+			int			status;
+
+			if (!fdwxact)
+			{
+				/*
+				 * Rollback the foreign transaction if its foreign server
+				 * supports transaction callbacks.
+				 */
+				if (ServerSupportTransactionCallack(fdw_part))
+					FdwXactParticipantEndTransaction(fdw_part, false);
+
+				continue;
+			}
+
+			/*
+			 * Abort the foreign transaction.  For participants whose status
+			 * is FDWXACT_STATUS_PREPARING, we close the transaction in
+			 * one-phase. In addition, since we are not sure that the
+			 * preparation has been completed on the foreign server, we also
+			 * attempts to rollback the prepared foreign transaction.  Note
+			 * that it's FDWs responsibility that they tolerate OBJECT_NOT_FOUND
+			 * error in abort case.
+			 */
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+
+			need_wait = true;
+		}
+
+		/*
+		 * Wait for all prepared or possibly-prepared foreign transactions
+		 * to be resolved.
+		 */
+		if (need_wait)
+		{
+			Assert(TransactionIdIsValid(FdwXactLocalXid));
+			FdwXactWaitForResolution(FdwXactLocalXid, false);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
+ */
+void
+ForgetAllFdwXactParticipants(void)
+{
+	ListCell   *cell;
+	int			nlefts = 0;
+
+	if (FdwXactParticipants == NIL)
+	{
+		Assert(FdwXactParticipants_tmp == NIL);
+		Assert(!ForeignTwophaseCommitIsRequired);
+		return;
+	}
+
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/*
+		 * Unlock the foreign transaction entries.  Note that there is a race
+		 * condition; the FdwXact entries in FdwXactParticipants could be used
+		 * by other backend before we forget in case where the resolver process
+		 * removes the FdwXact entry and other backend reuses it before we
+		 * forget.  So we need to check if the entries are still associated with
+		 * the transaction.  We cannnot use locking_backend to check because the
+		 * entry might be already held by the resolver process.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->valid && fdwxact->local_xid == FdwXactLocalXid)
+		{
+			if (fdwxact->locking_backend == MyBackendId)
+				fdwxact->locking_backend = InvalidBackendId;
+
+			fdwxact->proc = NULL;
+			nlefts++;
+		}
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction and take over them to the foreign
+	 * transaction resolver.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
+	}
+
+	list_free(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+	FdwXactParticipants_tmp = NIL;
+	FdwXactLocalXid = InvalidTransactionId;
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Resolve foreign transactions at the give indexes. If 'waiter' is not NULL,
+ * we release the waiter after we resolved all of the given foreign transactions
+ * Also on failure, we re-enqueue the waiting backend after incremented the next
+ * resolution time.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		PG_TRY();
+		{
+			FdwXactResolveOneFdwXact(fdwxact);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			if (waiter)
+			{
+				LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+				if (waiter->fdwXactState == FDWXACT_WAITING)
+				{
+					SHMQueueDelete(&(waiter->fdwXactLinks));
+					pg_write_barrier();
+					waiter->fdwXactNextResolutionTs =
+						TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+													foreign_xact_resolution_retry_interval);
+					FdwXactQueueInsert(waiter);
+				}
+				LWLockRelease(FdwXactResolutionLock);
+			}
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+
+	if (!waiter)
+		return;
+
+	/*
+	 * We have resolved all foreign transactions.  Remove waiter from shmem queue,
+	 * if not detached yet. The waiter could already be detached if user cancelled
+	 * to wait before resolution.
+	 */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx != -1);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.  Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare resolution state to pass to API */
+	state.xid = fdwxact->local_xid;
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.  The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that prepared
+	 * this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START();
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE();
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+		XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.  ShmemVariableCache->nextFullXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.  Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+restoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->proc = NULL;
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "prepared (commit)";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "prepared (abort)";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = BoolGetDatum(fdwxact->proc == NULL);
+		values[5] =
+			PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+													 strlen(fdwxact->fdwxact_id)));
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId || fdwxact->proc)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction entry whose local transaction is prepared"),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	FdwXactCtl->fdwxacts[idx]->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1, NULL);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId || fdwxact->proc)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..c9b2ee6661
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,558 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * A resolver process resolves the foreign transactions that are
+		 * waiting for resolution or are not being processed by anyone.
+		 * But we don't need to launch a resolver for foreign transactions
+		 * whose local transaction is prepared.
+		 */
+		if ((!fdwxact->proc && !TwoPhaseExists(fdwxact->local_xid)) ||
+			(fdwxact->proc && fdwxact->proc->fdwXactState == FDWXACT_WAITING))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..d7c1a9a638
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,454 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_fdwxacts(PGPROC *waiter);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. We clear that flag from those entries on failure.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* true during processing online foreign transactions */
+static bool processing_online = false;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If the resolver exits during processing online transactions,
+	 * there might be other waiting online transactions. So request to
+	 * re-launch.
+	 */
+	if (processing_online)
+		FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or the queue has
+		 * only waiters that have a future resolution timestamp.
+		 *
+		 * Set processing_online so that we can request to relaunch on failure.
+		 */
+		processing_online = true;
+		for (;;)
+		{
+			PGPROC	   *waiter;
+
+			CHECK_FOR_INTERRUPTS();
+
+			LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+
+			/* Get the waiter from the queue */
+			waiter = FdwXactGetWaiter(now, &resolutionTs, &waitXid);
+
+			if (!waiter)
+			{
+				/* Not found, break */
+				LWLockRelease(FdwXactResolutionLock);
+				break;
+			}
+
+			/* Hold the waiter's foreign transactions */
+			hold_fdwxacts(waiter);
+			Assert(nheld > 0);
+
+			LWLockRelease(FdwXactResolutionLock);
+
+			/*
+			 * Resolve the waiter's foreign transactions and release the
+			 * waiter.
+			 */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, waiter);
+			CommitTransactionCommand();
+
+			last_resolution_time = now;
+		}
+		processing_online = false;
+
+		/* Hold indoubt foreign transactions */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, NULL);
+			CommitTransactionCommand();
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		/* There is no waiting backend */
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * FdwXactWaiterExists check.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Take foreign transactions whose local transaction is already finished.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		/* Take entry if valid but not processed by anyone */
+		if (fdwxact->valid && fdwxact->dbid == MyDatabaseId &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!fdwxact->proc &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+
+			/* Take over the entry */
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Lock foreign transactions associated with the given waiter's transaction
+ * as in-processing.  The caller must hold FdwXactResolutionLock so that
+ * the waiter doesn't change its state.
+ */
+static void
+hold_fdwxacts(PGPROC *waiter)
+{
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->local_xid == waiter->fdwXactWaitXid)
+		{
+			Assert(fdwxact->proc->fdwXactState == FDWXACT_WAITING);
+			Assert(fdwxact->dbid == waiter->databaseId);
+
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4b3e67eb49 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..200cf9d067 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ef4f9981e3..80126439b1 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -845,6 +846,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction made by the given XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2188,6 +2217,14 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	XLogRecPtr	recptr;
 	TimestampTz committs = GetCurrentTimestamp();
 	bool		replorigin;
+	bool		need_fdwxact_commit;
+	bool		canceled = false;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Are we using the replication origins feature?  Or, in other words, are
@@ -2252,12 +2289,24 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	END_CRIT_SECTION();
 
 	/*
-	 * Wait for synchronous replication, if required.
+	 * Wait for both synchronous replication and foreign transaction
+	 * resolution, if required
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	SyncRepWaitForLSN(recptr, true);
+	canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+	if (need_fdwxact_commit)
+	{
+		/* Collect foreign transaction participants */
+		SetFdwXactParticipants(xid);
+
+		if (!canceled)
+			FdwXactWaitForResolution(xid, true);
+
+		ForgetAllFdwXactParticipants();
+	}
 }
 
 /*
@@ -2277,6 +2326,14 @@ RecordTransactionAbortPrepared(TransactionId xid,
 							   const char *gid)
 {
 	XLogRecPtr	recptr;
+	bool		need_fdwxact_commit;
+	bool		canceled = false;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = PrepareFdwXactParticipants(xid);
 
 	/*
 	 * Catch the scenario where we aborted partway through
@@ -2311,12 +2368,24 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	END_CRIT_SECTION();
 
 	/*
-	 * Wait for synchronous replication, if required.
+	 * Wait for both synchronous replication and foreign transaction
+	 * resolution, if required
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	SyncRepWaitForLSN(recptr, false);
+	canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+	if (need_fdwxact_commit)
+	{
+		/* Collect foreign transaction participants */
+		SetFdwXactParticipants(xid);
+
+		if (!canceled)
+			FdwXactWaitForResolution(xid, false);
+
+		ForgetAllFdwXactParticipants();
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..072dd2e439 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1236,6 +1237,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_commit_globally;
 
 	/*
 	 * Log pending invalidations for logical decoding of in-progress
@@ -1254,6 +1256,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_commit_globally = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1292,12 +1295,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_commit_globally)
 			goto cleanup;
 	}
 	else
@@ -1444,16 +1448,37 @@ RecordTransactionCommit(void)
 	latestXid = TransactionIdLatest(xid, nchildren, children);
 
 	/*
-	 * Wait for synchronous replication, if required. Similar to the decision
-	 * above about using committing asynchronously we only want to wait if
-	 * this backend assigned an xid and wrote WAL.  No need to wait if an xid
-	 * was assigned due to temporary/unlogged tables or due to HOT pruning.
+	 * Wait for both synchronous replication and prepared foreign transaction
+	 * to be committed, if required.  We must wait for synchrnous replication
+	 * first because we need to make sure that the fate of the current
+	 * transaction is consistent between the primary and sync replicas before
+	 * resolving foreign transaction.  Otherwise, we will end up violating
+	 * atomic commit if a fail-over happens after some of foreign transactions
+	 * are committed.
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	if (wrote_xlog && markXidCommitted)
-		SyncRepWaitForLSN(XactLastRecEnd, true);
+	if (markXidCommitted)
+	{
+		bool canceled = false;
+
+		/*
+		 * Similar to the decision above about using committing asynchronously
+		 * we only want to wait if this backend assigned an xid, wrote WAL,
+		 * and not received a query cancel.  No need to wait if an xid was
+		 * assigned due to temporary/unlogged tables or due to HOT pruning.
+		 */
+		if (wrote_xlog)
+			canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+		/*
+		 * We only want to wait if we prepared foreign transactions in this
+		 * transaction and not received query cancel.
+		 */
+		if (!canceled && need_commit_globally)
+			FdwXactWaitForResolution(xid, true);
+	}
 
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
@@ -2114,6 +2139,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
@@ -2281,6 +2309,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2368,6 +2397,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2559,6 +2591,9 @@ PrepareTransaction(void)
 	 */
 	PostPrepare_Twophase();
 
+	/* Release held FdwXact entries */
+	PostPrepare_FdwXact();
+
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
@@ -2781,6 +2816,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_FdwXact(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..63334d556d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4599,6 +4600,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6286,6 +6288,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_wal_senders",
 									 max_wal_senders,
 									 ControlFile->max_wal_senders);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
@@ -6836,14 +6841,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	restoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7045,7 +7051,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7558,6 +7567,7 @@ StartupXLOG(void)
 	 * as potential problems are detected before any on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7888,6 +7898,9 @@ StartupXLOG(void)
 	/* Reload shared-memory state for prepared transactions */
 	RecoverPreparedTransactions();
 
+	/* Load all foreign transaction entries from disk to memory */
+	RecoverFdwXacts();
+
 	/*
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
@@ -9184,6 +9197,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9726,8 +9740,10 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
-		track_commit_timestamp != ControlFile->track_commit_timestamp)
+		track_commit_timestamp != ControlFile->track_commit_timestamp ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts)
 	{
 		/*
 		 * The change in number of backend slots doesn't need to be WAL-logged
@@ -9745,6 +9761,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9763,6 +9780,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9970,6 +9988,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10173,6 +10192,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ba5a23ac25..4d5e847739 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+       SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..12602c02b0 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1078,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * If there is a foreign prepared transaction with this foreign server,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1410,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * If there is a foreign prepared transaction with this user mapping,
+	 * dropping it might result in dangling prepared transaction.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..8f411c0559 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok)
 	return GetForeignServer(serverid);
 }
 
+/*
+ * GetUserMappingOid - look up the user mapping by user mapping oid.
+ *
+ * If userid of the mapping is invalid, we set it to current userid.
+ */
+UserMapping *
+GetUserMappingByOid(Oid umid)
+{
+	Datum		datum;
+	HeapTuple   tp;
+	UserMapping	*um;
+	bool		isnull;
+	Form_pg_user_mapping tableform;
+
+	tp = SearchSysCache1(USERMAPPINGOID,
+						 ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("user mapping not found for %d", umid)));
+
+	tableform = (Form_pg_user_mapping) GETSTRUCT(tp);
+	um = (UserMapping *) palloc(sizeof(UserMapping));
+	um->umid = umid;
+	um->userid = OidIsValid(tableform->umuser) ?
+		tableform->umuser : GetUserId();
+	um->serverid = tableform->umserver;
+
+	/* Extract the umoptions */
+	datum = SysCacheGetAttr(USERMAPPINGUSERSERVER,
+							tp,
+							Anum_pg_user_mapping_umoptions,
+							&isnull);
+	if (isnull)
+		um->options = NIL;
+	else
+		um->options = untransformRelOptions(datum);
+
+	ReleaseSysCache(tp);
+
+	return um;
+}
 
 /*
  * GetUserMapping - look up the user mapping.
@@ -328,6 +371,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2258424e81 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -12,6 +12,8 @@
 
 #include "postgres.h"
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 73ce944fb1..00b6838a98 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3663,6 +3663,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3773,6 +3779,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATE:
 			event_name = "HashBatchAllocate";
 			break;
@@ -4102,6 +4111,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_TWOPHASE_FILE_WRITE:
 			event_name = "TwophaseFileWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ:
 			event_name = "WALSenderTimelineHistoryRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 42223c0f61..be66064325 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,8 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -924,6 +926,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -988,12 +994,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d5e1..67413d6630 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index df1e341c76..5b8713d4f1 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -143,13 +143,17 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
  * represents a commit record.  If it doesn't, then we wait only for the WAL
  * to be flushed if synchronous_commit is set to the higher level of
  * remote_apply, because only commit records provide apply feedback.
+ *
+ * This function return true if the wait is cancelelled due to an
+ * interruption.
  */
-void
+bool
 SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 {
 	char	   *new_status = NULL;
 	const char *old_status;
 	int			mode;
+	bool		canceled = false;
 
 	/*
 	 * This should be called while holding interrupts during a transaction
@@ -168,7 +172,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 * Fast exit if user has not requested sync replication.
 	 */
 	if (!SyncRepRequested())
-		return;
+		return false;
 
 	Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
 	Assert(WalSndCtl != NULL);
@@ -188,7 +192,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		lsn <= WalSndCtl->lsn[mode])
 	{
 		LWLockRelease(SyncRepLock);
-		return;
+		return false;
 	}
 
 	/*
@@ -258,6 +262,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -274,6 +279,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					(errmsg("canceling wait for synchronous replication due to user request"),
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -291,6 +297,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		if (rc & WL_POSTMASTER_DEATH)
 		{
 			ProcDiePending = true;
+			canceled = true;
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
 			break;
@@ -316,6 +323,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		set_ps_display(new_status);
 		pfree(new_status);
 	}
+
+	return canceled;
 }
 
 /*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..271fd35884 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +151,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +271,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 60b7a5db8e..5182cc4e87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -184,11 +186,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -207,8 +211,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replications lots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replications lots and  and unresolved
+	 * distributed transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -407,6 +412,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1677,6 +1683,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1795,6 +1802,13 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
+
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1850,6 +1864,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3741,6 +3758,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions still needed by resolving
+ * distributed transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..a6d40446ce 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,6 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
+FdwXactResolverLock					49
+FdwXactResolutionLock				50
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index aa9fbd8054..82fb73fdbd 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -417,6 +418,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdw xact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -817,6 +822,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c9424f167c..f6da103fbd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index de87ad6ef7..0c5587d781 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -430,6 +431,24 @@ static const struct config_enum_entry synchronous_commit_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", "try" are documented, we accept all the likely
  * variants of "on" and "off".
@@ -759,6 +778,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2458,6 +2481,52 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	/*
+	 * See also CheckRequiredParameterValues() if this parameter changes
+	 */
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -4609,6 +4678,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..97236040a3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -348,6 +350,20 @@
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
+
 #------------------------------------------------------------------------------
 # QUERY TUNING
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..8701c5f005 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -81,6 +81,8 @@ provider postgresql {
 	probe multixact__checkpoint__done(bool);
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
+	probe fdwxact__checkpoint__start();
+	probe fdwxact__checkpoint__done();
 
 	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
 	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 786672b1b6..bc0c12b3b8 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -208,6 +208,7 @@ static const char *const subdirs[] = {
 	"pg_snapshots",
 	"pg_subtrans",
 	"pg_twophase",
+	"pg_fdwxact",
 	"pg_multixact",
 	"pg_multixact/members",
 	"pg_multixact/offsets",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index cb6ef19182..1712b794c3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..818e2d6b3e
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,170 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/fdwxact_xlog.h"
+#include "access/xlogreader.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "nodes/execnodes.h"
+#include "storage/backendid.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+typedef struct FdwXactData *FdwXact;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/*
+	 * A backend process that executed the distributed transaction. The owner
+	 * and a process locking this entry can be different during transaction
+	 * resolution as the resolver takes over the entry.
+	 */
+	PGPROC		*proc;			/* process that executed the distributed tx. */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	TransactionId xid;
+
+	/* Foreign transaction information */
+	char	   *fdwxact_id;
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
+
+/* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+extern void ForgetAllFdwXactParticipants(void);
+extern void FdwXactReleaseWaiter(PGPROC *waiter);
+extern void FdwXactWaitForResolution(TransactionId wait_xid, bool commit);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter);
+extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+								TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern bool PrepareFdwXactParticipants(TransactionId xid);
+extern void SetFdwXactParticipants(TransactionId xid);
+extern void ClearFdwXactParticipants(void);
+extern void PreCommit_FdwXact(void);
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void AtPrepare_FdwXact(void);
+extern void PostPrepare_FdwXact(void);
+extern void FdwXactCleanupAtProcExit(void);
+extern void restoreFdwXactData(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
+extern void RecoverFdwXacts(void);
+extern bool FdwXactExists(Oid dboid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+
+#endif							/* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index df1b43a932..6d3db4fcd3 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -106,6 +106,13 @@ extern int	MyXactFlags;
  */
 #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK	(1U << 1)
 
+/*
+ * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which
+ * server isn't capable of two-phase commit
+ * relation.
+ */
+#define XACT_FLAGS_FDWNOPREPARE					(1U << 2)
+
 /*
  *	start- and end-of-transaction callbacks for dynamically loaded modules
  */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..e1b09a70d2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..ed6372d2e6 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 27989971db..c1347851d4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5993,6 +5993,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6111,6 +6129,10 @@
 { oid => '2851', descr => 'wal filename, given a wal location',
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
 
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..8d046cc4e4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -12,6 +12,7 @@
 #ifndef FDWAPI_H
 #define FDWAPI_H
 
+#include "access/fdwxact.h"
 #include "access/parallel.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
@@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -236,6 +242,12 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for transaction management */
+	PrepareForeignTransaction_function PrepareForeignTransaction;
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
+	GetPrepareId_function GetPrepareId;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5e0cf533fb..5596ee591c 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid,
 											   bits16 flags);
 extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok);
 extern UserMapping *GetUserMapping(Oid userid, Oid serverid);
+extern UserMapping *GetUserMappingByOid(Oid umid);
 extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
 extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid,
 														 bits16 flags);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..83bfc9345b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -806,6 +806,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,6 +855,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_FDWXACT_RESOLUTION,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
 	WAIT_EVENT_HASH_BATCH_LOAD,
@@ -970,6 +973,9 @@ typedef enum
 	WAIT_EVENT_TWOPHASE_FILE_READ,
 	WAIT_EVENT_TWOPHASE_FILE_SYNC,
 	WAIT_EVENT_TWOPHASE_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
 	WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
 	WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 9d286b66c6..cffab9c721 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -82,7 +82,7 @@ extern char *syncrep_parse_error_msg;
 extern char *SyncRepStandbyNames;
 
 /* called by user backend */
-extern void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
+extern bool SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
 
 /* called at backend exit */
 extern void SyncRepCleanupAtProcExit(void);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 9c9a50ae45..06c9f4095f 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -188,6 +189,17 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int				fdwXactState;	/* wait state for foreign transaction
+									 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..2296344dc0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -92,4 +92,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
 
+
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 04431d0eb2..a00ca73355 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..9e55fbeec8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.23.0

#114Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#111)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020-07-17 15:55, Masahiko Sawada wrote:

On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda
<ikedamsh(at)oss(dot)nttdata(dot)com>
wrote:

On 2020-07-16 13:16, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda
<ikedamsh(at)oss(dot)nttdata(dot)com>
wrote:

I've attached the latest version patches. I've incorporated the
review
comments I got so far and improved locking strategy.

I want to ask a question about streaming replication with 2PC.
Are you going to support 2PC with streaming replication?

I tried streaming replication using v23 patches.
I confirm that 2PC works with streaming replication,
which there are primary/standby coordinator.

But, in my understanding, the WAL of "PREPARE" and
"COMMIT/ABORT PREPARED" can't be replicated to the standby server
in
sync.

If this is right, the unresolved transaction can be occurred.

For example,

1. PREPARE is done
2. crash primary before the WAL related to PREPARE is
replicated to the standby server
3. promote standby server // but can't execute "ABORT PREPARED"

In above case, the remote server has the unresolved transaction.
Can we solve this problem to support in-sync replication?

But, I think some users use async replication for performance.
Do we need to document the limitation or make another solution?

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so
far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes.

Thanks! I didn't noticed the behavior when a user executes PREPARE
TRANSACTION is same.

IIUC with 2PC, there is a different point between (1)PREPARE
TRANSACTION
and (2)2PC.
The point is that whether the client can know when the server crashed
and it's global tx id.

If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
command
because if the remote server is already prepared the command will be
ignored.

But, if (2)2PC is failed with coordinator crash, the client can't
know
what operations should be done.

If the old coordinator already executed PREPARED, there are some
transaction which should be ABORT PREPARED.
But if the PREPARED WAL is not sent to the standby, the new
coordinator
can't execute ABORT PREPARED.
And the client can't know which remote servers have PREPARED
transactions which should be ABORTED either.

Even if the client can know that, only the old coordinator knows its
global transaction id.
Only the database administrator can analyze the old coordinator's log
and then execute the appropriate commands manually, right?

I think that's right. In the case of the coordinator crash, the user
can look orphaned foreign prepared transactions by checking the
'identifier' column of pg_foreign_xacts on the new standby server and
the prepared transactions on the remote servers.

I think there is a case we can't check orphaned foreign
prepared transaction in pg_foreign_xacts view on the new standby
server.
It confuses users and database administrators.

If the primary coordinator crashes after preparing foreign transaction,
but before sending XLOG_FDWXACT_INSERT records to the standby server,
the standby server can't restore their transaction status and
pg_foreign_xacts view doesn't show the prepared foreign transactions.

To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.

If the primary replicates XLOG_FDWXACT_INSERT to the standby
asynchronously,
some prepared transaction may be unsolved forever.

Since I think to solve this inconsistency manually is hard operation,
we need to support synchronous XLOG_FDWXACT_INSERT replication.

I understood that there are a lot of impact to the performance,
but users can control the consistency/durability vs performance
with synchronous_commit parameter.

What do you think?

Thank you for letting me know. I've attached the latest version patch
set.

Thanks for updating.
But, the latest patches failed to be applied to the master branch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#115Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiro Ikeda (#114)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 28 Aug 2020 at 17:50, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

I think there is a case we can't check orphaned foreign
prepared transaction in pg_foreign_xacts view on the new standby
server.
It confuses users and database administrators.

If the primary coordinator crashes after preparing foreign transaction,
but before sending XLOG_FDWXACT_INSERT records to the standby server,
the standby server can't restore their transaction status and
pg_foreign_xacts view doesn't show the prepared foreign transactions.

To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.

If the primary replicates XLOG_FDWXACT_INSERT to the standby
asynchronously,
some prepared transaction may be unsolved forever.

Since I think to solve this inconsistency manually is hard operation,
we need to support synchronous XLOG_FDWXACT_INSERT replication.

I understood that there are a lot of impact to the performance,
but users can control the consistency/durability vs performance
with synchronous_commit parameter.

What do you think?

I think the user can check such prepared transactions by seeing
transactions that exist on the foreign server's pg_prepared_xact but
not on the coordinator server's pg_foreign_xacts, no? To make checking
such prepared transactions easy, perhaps we could contain the
timestamp to prepared transaction id. But I’m concerned the
duplication of transaction id due to clock skew.

If there is a way to identify such unresolved foreign transactions and
it's not cumbersome, given that the likelihood of problem you're
concerned is unlikely high I guess a certain number of would be able
to accept it as a restriction. So I’d recommend not dealing with this
problem in the first version patch and we will be able to improve this
feature to deal with this problem as an additional feature. Thoughts?

Thank you for letting me know. I've attached the latest version patch
set.

Thanks for updating.
But, the latest patches failed to be applied to the master branch.

I'll submit the updated version patch.

Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#116Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#115)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020-09-03 23:08, Masahiko Sawada wrote:

On Fri, 28 Aug 2020 at 17:50, Masahiro Ikeda <ikedamsh@oss.nttdata.com>
wrote:

I think there is a case we can't check orphaned foreign
prepared transaction in pg_foreign_xacts view on the new standby
server.
It confuses users and database administrators.

If the primary coordinator crashes after preparing foreign transaction,
but before sending XLOG_FDWXACT_INSERT records to the standby server,
the standby server can't restore their transaction status and
pg_foreign_xacts view doesn't show the prepared foreign transactions.

To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.

If the primary replicates XLOG_FDWXACT_INSERT to the standby
asynchronously,
some prepared transaction may be unsolved forever.

Since I think to solve this inconsistency manually is hard operation,
we need to support synchronous XLOG_FDWXACT_INSERT replication.

I understood that there are a lot of impact to the performance,
but users can control the consistency/durability vs performance
with synchronous_commit parameter.

What do you think?

I think the user can check such prepared transactions by seeing
transactions that exist on the foreign server's pg_prepared_xact but
not on the coordinator server's pg_foreign_xacts, no? To make checking
such prepared transactions easy, perhaps we could contain the
timestamp to prepared transaction id. But I’m concerned the
duplication of transaction id due to clock skew.

Thanks for letting me know.
I agreed that we can check pg_prepared_xact and pg_foreign_xacts.

We have to abort the transaction which exists in pg_prepared_xact and
doesn't exist in pg_foreign_xacts manually, don't we?
So users have to use the foreign database which supports to show
prepared transaction status like pg_foreign_xacts.

When duplication of transaction id is made?
I'm sorry that I couldn't understand about clock skew.

IICU, since prepared id may have coordinator's xid, there is no clock
skew
and we can determine transaction_id uniquely.
If the fdw implements GetPrepareId_function API and it generates
transaction_id without coordinator's xid, your concern will emerge.
But, I can't understand the case to generate transaction_id without
coordinator's xid.

If there is a way to identify such unresolved foreign transactions and
it's not cumbersome, given that the likelihood of problem you're
concerned is unlikely high I guess a certain number of would be able
to accept it as a restriction. So I’d recommend not dealing with this
problem in the first version patch and we will be able to improve this
feature to deal with this problem as an additional feature. Thoughts?

I agree. Thanks for your comments.

Thank you for letting me know. I've attached the latest version patch
set.

Thanks for updating.
But, the latest patches failed to be applied to the master branch.

I'll submit the updated version patch.

Thanks.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#117Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#113)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote:

Thank you for letting me know. I've attached the latest version patch set.

This needs a rebase. Patch 0002 is conflicting with some of the
recent changes done in syncrep.c and procarray.c, at least.
--
Michael

#118Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#113)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/08/21 15:25, Masahiko Sawada wrote:

On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/27 15:59, Masahiko Sawada wrote:

On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c
and src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.

Thanks for your comment.

Right. I've removed these changes on the local branch.

The latest patches failed to be applied to the master branch. Could you rebase the patches?

Thank you for letting me know. I've attached the latest version patch set.

Thanks for updating the patch!

IMO it's not easy to commit this 2PC patch at once because it's still large
and complicated. So I'm thinking it's better to separate the feature into
several parts and commit them gradually. What about separating
the feature into the following parts?

#1
Originally the server just executed xact callback that each FDW registered
when the transaction was committed. The patch changes this so that
the server manages the participants of FDW in the transaction and triggers
them to execute COMMIT or ROLLBACK. IMO this change can be applied
without 2PC feature. Thought?

Even if we commit this patch and add new interface for FDW, we would
need to keep the old interface, for the FDW providing only old interface.

#2
Originally when there was the FDW access in the transaction,
PREPARE TRANSACTION on that transaction failed with an error. The patch
allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED
even when FDW access occurs in the transaction. IMO this change can be
applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED are automatically executed for each FDW
inside "top" COMMIT command). Thought?

I'm not sure yet whether automatic resolution of "unresolved" prepared
transactions by the resolver process is necessary for this change or not.
If it's not necessary, it's better to exclude the resolver process from this
change, at this stage, to make the patch simpler.

#3
Finally IMO we can provide the patch supporting "automatic" 2PC for each FDW,
based on the #1 and #2 patches.

What's your opinion about this?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#119Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Fujii Masao (#118)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/09/07 17:59, Fujii Masao wrote:

On 2020/08/21 15:25, Masahiko Sawada wrote:

On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/27 15:59, Masahiko Sawada wrote:

On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR:  cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0   postgres                            0x000000010d52f3c0 ExceptionalCondition + 160
1   postgres                            0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2   postgres                            0x000000010cefff14 AtProcExit_FdwXact + 20
3   postgres                            0x000000010d313fe3 shmem_exit + 179
4   postgres                            0x000000010d313e7a proc_exit_prepare + 122
5   postgres                            0x000000010d313da3 proc_exit + 19
6   postgres                            0x000000010d35112f PostgresMain + 3711
7   postgres                            0x000000010d27bb3a BackendRun + 570
8   postgres                            0x000000010d27af6b BackendStartup + 475
9   postgres                            0x000000010d279ed1 ServerLoop + 593
10  postgres                            0x000000010d277940 PostmasterMain + 6016
11  postgres                            0x000000010d1597b9 main + 761
12  libdyld.dylib                       0x00007fff7161e3d5 start + 1
13  ???                                 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c,  src/backend/executor/nodeModifyTable.c
and  src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.

Thanks for your comment.

Right. I've removed these changes on the local branch.

The latest patches failed to be applied to the master branch. Could you rebase the patches?

Thank you for letting me know. I've attached the latest version patch set.

Thanks for updating the patch!

IMO it's not easy to commit this 2PC patch at once because it's still large
and complicated. So I'm thinking it's better to separate the feature into
several parts and commit them gradually. What about separating
the feature into the following parts?

#1
Originally the server just executed xact callback that each FDW registered
when the transaction was committed. The patch changes this so that
the server manages the participants of FDW in the transaction and triggers
them to execute COMMIT or ROLLBACK. IMO this change can be applied
without 2PC feature. Thought?

Even if we commit this patch and add new interface for FDW, we would
need to keep the old interface, for the FDW providing only old interface.

#2
Originally when there was the FDW access in the transaction,
PREPARE TRANSACTION on that transaction failed with an error. The patch
allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED
even when FDW access occurs in the transaction. IMO this change can be
applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED are automatically executed for each FDW
inside "top" COMMIT command). Thought?

I'm not sure yet whether automatic resolution of "unresolved" prepared
transactions by the resolver process is necessary for this change or not.
If it's not necessary, it's better to exclude the resolver process from this
change, at this stage, to make the patch simpler.

#3
Finally IMO we can provide the patch supporting "automatic" 2PC for each FDW,
based on the #1 and #2 patches.

What's your opinion about this?

Also I'd like to report some typos in the patch.

+#define ServerSupportTransactionCallack(fdw_part) \

"Callack" in this macro name should be "Callback"?

+#define SeverSupportTwophaseCommit(fdw_part) \

"Sever" in this macro name should be "Server"?

+ proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool',

"foreing" should be "foreign"?

+ * FdwXact entry we call get_preparedid callback to get a transaction

"get_preparedid" should be "get_prepareid"?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#120Amit Kapila
amit.kapila16@gmail.com
In reply to: Fujii Masao (#118)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

IMO it's not easy to commit this 2PC patch at once because it's still large
and complicated. So I'm thinking it's better to separate the feature into
several parts and commit them gradually.

Hmm, I don't see that we have a consensus on the design and or
interfaces of this patch and without that proceeding for commit
doesn't seem advisable. Here are a few points which I remember offhand
that require more work.
1. There is a competing design proposed and being discussed in another
thread [1]/messages/by-id/07b2c899-4ed0-4c87-1327-23c750311248@postgrespro.ru for this purpose. I think both the approaches have pros and
cons but there doesn't seem to be any conclusion yet on which one is
better.
2. In this thread, we have discussed to try integrating this patch
with some other FDWs (say MySQL, mongodb, etc.) to ensure that the
APIs we are exposing are general enough that other FDWs can use them
to implement 2PC. I could see some speculations about the same but no
concrete work on the same has been done.
3. In another thread [1]/messages/by-id/07b2c899-4ed0-4c87-1327-23c750311248@postgrespro.ru, we have seen that the patch being discussed
in this thread might need to re-designed if we have to use some other
design for global-visibility than what is proposed in that thread. I
think it is quite likely that can happen considering no one is able to
come up with the solution to major design problems spotted in that
patch yet.

It appears to me that even though these points were raised before in
some form we are just trying to bypass them to commit whatever we have
in the current patch which I find quite surprising.

[1]: /messages/by-id/07b2c899-4ed0-4c87-1327-23c750311248@postgrespro.ru

--
With Regards,
Amit Kapila.

#121Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Amit Kapila (#120)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/09/08 10:34, Amit Kapila wrote:

On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

IMO it's not easy to commit this 2PC patch at once because it's still large
and complicated. So I'm thinking it's better to separate the feature into
several parts and commit them gradually.

Hmm, I don't see that we have a consensus on the design and or
interfaces of this patch and without that proceeding for commit
doesn't seem advisable. Here are a few points which I remember offhand
that require more work.

Thanks!

1. There is a competing design proposed and being discussed in another
thread [1] for this purpose. I think both the approaches have pros and
cons but there doesn't seem to be any conclusion yet on which one is
better.

I was thinking that [1] was discussing global snapshot feature for
"atomic visibility" rather than the solution like 2PC for "atomic commit".
But if another approach for "atomic commit" was also proposed at [1],
that's good. I will check that.

2. In this thread, we have discussed to try integrating this patch
with some other FDWs (say MySQL, mongodb, etc.) to ensure that the
APIs we are exposing are general enough that other FDWs can use them
to implement 2PC. I could see some speculations about the same but no
concrete work on the same has been done.

Yes, you're right.

3. In another thread [1], we have seen that the patch being discussed
in this thread might need to re-designed if we have to use some other
design for global-visibility than what is proposed in that thread. I
think it is quite likely that can happen considering no one is able to
come up with the solution to major design problems spotted in that
patch yet.

You imply that global-visibility patch should be come first before "2PC" patch?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#122Amit Kapila
amit.kapila16@gmail.com
In reply to: Fujii Masao (#121)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Sep 8, 2020 at 8:05 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/09/08 10:34, Amit Kapila wrote:

On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

IMO it's not easy to commit this 2PC patch at once because it's still large
and complicated. So I'm thinking it's better to separate the feature into
several parts and commit them gradually.

Hmm, I don't see that we have a consensus on the design and or
interfaces of this patch and without that proceeding for commit
doesn't seem advisable. Here are a few points which I remember offhand
that require more work.

Thanks!

1. There is a competing design proposed and being discussed in another
thread [1] for this purpose. I think both the approaches have pros and
cons but there doesn't seem to be any conclusion yet on which one is
better.

I was thinking that [1] was discussing global snapshot feature for
"atomic visibility" rather than the solution like 2PC for "atomic commit".
But if another approach for "atomic commit" was also proposed at [1],
that's good. I will check that.

Okay, that makes sense.

2. In this thread, we have discussed to try integrating this patch
with some other FDWs (say MySQL, mongodb, etc.) to ensure that the
APIs we are exposing are general enough that other FDWs can use them
to implement 2PC. I could see some speculations about the same but no
concrete work on the same has been done.

Yes, you're right.

3. In another thread [1], we have seen that the patch being discussed
in this thread might need to re-designed if we have to use some other
design for global-visibility than what is proposed in that thread. I
think it is quite likely that can happen considering no one is able to
come up with the solution to major design problems spotted in that
patch yet.

You imply that global-visibility patch should be come first before "2PC" patch?

I intend to say that the global-visibility work can impact this in a
major way and we have analyzed that to some extent during a discussion
on the other thread. So, I think without having a complete
design/solution that addresses both the 2PC and global-visibility, it
is not apparent what is the right way to proceed. It seems to me that
rather than working on individual (or smaller) parts one needs to come
up with a bigger picture (or overall design) and then once we have
figured that out correctly, it would be easier to decide which parts
can go first.

--
With Regards,
Amit Kapila.

#123tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Kapila (#122)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Amit Kapila <amit.kapila16@gmail.com>

I intend to say that the global-visibility work can impact this in a
major way and we have analyzed that to some extent during a discussion
on the other thread. So, I think without having a complete
design/solution that addresses both the 2PC and global-visibility, it
is not apparent what is the right way to proceed. It seems to me that
rather than working on individual (or smaller) parts one needs to come
up with a bigger picture (or overall design) and then once we have
figured that out correctly, it would be easier to decide which parts
can go first.

I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss the big picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holding me captive; I'm just late.) Please wait a few days.

But to proceed with the development, let me comment on the atomic commit and global visibility.

* We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we can avoid it.

* I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-node atomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available, and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shut down when the node's clock is distant from the other nodes.

* Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to be transferred among the cluster nodes. However, this seems to have to track the order of read and write operations among concurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCO paper seems to present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybody kindly interpret this?

Commitment ordering (CO) - yoavraz2
https://sites.google.com/site/yoavraz2/the_principle_of_co

As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to be addressed:

1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdw and jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA feature as SQL statements, not C functions as defined in the XA specification.

2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Each backend should perform 2PC.

3. postgres_fdw cannot detect remote updates when the UDF executed on a remote node updates data.

Regards
Takayuki Tsunakawa

#124Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Fujii Masao (#118)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, 7 Sep 2020 at 17:59, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/08/21 15:25, Masahiko Sawada wrote:

On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/27 15:59, Masahiko Sawada wrote:

On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote:

On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/16 14:47, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.

Thanks for updating the patch!

+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR: cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3

Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c
and src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.

Thanks for your comment.

Right. I've removed these changes on the local branch.

The latest patches failed to be applied to the master branch. Could you rebase the patches?

Thank you for letting me know. I've attached the latest version patch set.

Thanks for updating the patch!

IMO it's not easy to commit this 2PC patch at once because it's still large
and complicated. So I'm thinking it's better to separate the feature into
several parts and commit them gradually. What about separating
the feature into the following parts?

#1
Originally the server just executed xact callback that each FDW registered
when the transaction was committed. The patch changes this so that
the server manages the participants of FDW in the transaction and triggers
them to execute COMMIT or ROLLBACK. IMO this change can be applied
without 2PC feature. Thought?

Even if we commit this patch and add new interface for FDW, we would
need to keep the old interface, for the FDW providing only old interface.

#2
Originally when there was the FDW access in the transaction,
PREPARE TRANSACTION on that transaction failed with an error. The patch
allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED
even when FDW access occurs in the transaction. IMO this change can be
applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED are automatically executed for each FDW
inside "top" COMMIT command). Thought?

I'm not sure yet whether automatic resolution of "unresolved" prepared
transactions by the resolver process is necessary for this change or not.
If it's not necessary, it's better to exclude the resolver process from this
change, at this stage, to make the patch simpler.

#3
Finally IMO we can provide the patch supporting "automatic" 2PC for each FDW,
based on the #1 and #2 patches.

What's your opinion about this?

Regardless of which approaches of 2PC implementation being selected
splitting the patch into logical small patches is a good idea and the
above suggestion makes sense to me.

Regarding #2, I guess that we would need resolver and launcher
processes even if we would support only manual PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED commands:

On COMMIT PREPARED command, I think we should commit the local
prepared transaction first then commit foreign prepared transactions.
Otherwise, it violates atomic commit principles when the local node
failed to commit a foreign prepared transaction and the user changed
to ROLLBACK PREPARED. OTOH once we committed locally, we cannot change
to rollback. And attempting to commit foreign prepared transactions
could lead an error due to connection error, OOM caused by palloc etc.
Therefore we discussed using background processes, resolver and
launcher, to take in charge of committing foreign prepared
transactions so that the process who executed COMMIT PREPARED will
never error out after local commit. So I think the patch #2 will have
the patch also adding resolver and launcher processes. And in the
patch #3 we will change the code to support automatic 2PC as you
suggested.

In addition, the part of the automatic resolution of in-doubt
transactions can also be a separate patch, which will be the #4 patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#125Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Fujii Masao (#118)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

#2
Originally when there was the FDW access in the transaction,
PREPARE TRANSACTION on that transaction failed with an error. The patch
allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED
even when FDW access occurs in the transaction. IMO this change can be
applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED are automatically executed for each FDW
inside "top" COMMIT command). Thought?

I'm not sure yet whether automatic resolution of "unresolved" prepared
transactions by the resolver process is necessary for this change or not.
If it's not necessary, it's better to exclude the resolver process from this
change, at this stage, to make the patch simpler.

I agree with this. However, in case of explicit prepare, if we are not
going to try automatic resolution, it might be better to provide a way
to pass the information about transactions prepared on the foreign
servers if they can not be resolved at the time of commit so that the
user can take it up to resolve those him/herself. This was an idea
that Tom had suggested at the very beginning of the first take.

--
Best Wishes,
Ashutosh Bapat

#126Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Amit Kapila (#122)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/09/08 12:03, Amit Kapila wrote:

On Tue, Sep 8, 2020 at 8:05 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/09/08 10:34, Amit Kapila wrote:

On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

IMO it's not easy to commit this 2PC patch at once because it's still large
and complicated. So I'm thinking it's better to separate the feature into
several parts and commit them gradually.

Hmm, I don't see that we have a consensus on the design and or
interfaces of this patch and without that proceeding for commit
doesn't seem advisable. Here are a few points which I remember offhand
that require more work.

Thanks!

1. There is a competing design proposed and being discussed in another
thread [1] for this purpose. I think both the approaches have pros and
cons but there doesn't seem to be any conclusion yet on which one is
better.

I was thinking that [1] was discussing global snapshot feature for
"atomic visibility" rather than the solution like 2PC for "atomic commit".
But if another approach for "atomic commit" was also proposed at [1],
that's good. I will check that.

Okay, that makes sense.

I read Alexey's 2PC patch (0001-Add-postgres_fdw.use_twophase-GUC-to-use-2PC.patch)
proposed at [1]/messages/by-id/3ef7877bfed0582019eab3d462a43275@postgrespro.ru. As Alexey told at that thread, there are two big differences
between his patch and Sawada-san's; 1) whether there is the resolver process
for foreign transactions, 2) 2PC logic is implemented only inside postgres_fdw
or both FDW and PostgreSQL core.

I think that 2) is the first decision point. Alexey's 2PC patch is very simple
and all the 2PC logic is implemented only inside postgres_fdw. But this
means that 2PC is not usable if multiple types of FDW (e.g., postgres_fdw
and mysql_fdw) participate at the transaction. This may be ok if we implement
2PC feature only for PostgreSQL sharding using postgres_fdw. But if we
implement 2PC as the improvement on FDW independently from PostgreSQL
sharding, I think that it's necessary to support other FDW. And this is our
direction, isn't it?

Sawada-san's patch supports that case by implememnting some conponents
for that also in PostgreSQL core. For example, with the patch, all the remote
transactions that participate at the transaction are managed by PostgreSQL
core instead of postgres_fdw layer.

Therefore, at least regarding the difference 2), I think that Sawada-san's
approach is better. Thought?

[1]: /messages/by-id/3ef7877bfed0582019eab3d462a43275@postgrespro.ru
/messages/by-id/3ef7877bfed0582019eab3d462a43275@postgrespro.ru

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#127tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Fujii Masao (#126)
RE: Transactions involving multiple postgres foreign servers, take 2

Alexey-san, Sawada-san,
cc: Fujii-san,

From: Fujii Masao <masao.fujii@oss.nttdata.com>

But if we
implement 2PC as the improvement on FDW independently from PostgreSQL
sharding, I think that it's necessary to support other FDW. And this is our
direction, isn't it?

I understand the same way as Fujii san. 2PC FDW is itself useful, so I think we should pursue the tidy FDW interface and good performance withinn the FDW framework. "tidy" means that many other FDWs should be able to implement it. I guess XA/JTA is the only material we can use to consider whether the FDW interface is good.

Sawada-san's patch supports that case by implememnting some conponents
for that also in PostgreSQL core. For example, with the patch, all the remote
transactions that participate at the transaction are managed by PostgreSQL
core instead of postgres_fdw layer.

Therefore, at least regarding the difference 2), I think that Sawada-san's
approach is better. Thought?

I think so. Sawada-san's patch needs to address the design issues I posed before digging into the code for thorough review, though.

BTW, is there something Sawada-san can take from Alexey-san's patch? I'm concerned about the performance for practical use. Do you two have differences in these points, for instance? The first two items are often cited to evaluate the algorithm's performance, as you know.

* The number of round trips to remote nodes.
* The number of disk I/Os on each node and all nodes in total (WAL, two-phase file, pg_subtrans file, CLOG?).
* Are prepare and commit executed in parallel on remote nodes? (serious DBMSs do so)
* Is there any serialization point in the processing? (Sawada-san's has one)

I'm sorry to repeat myself, but I don't think we can compromise the 2PC performance. Of course, we recommend users to design a schema that co-locates data that each transaction accesses to avoid 2PC, but it's not always possible (e.g., when secondary indexes are used.)

Plus, as the following quote from TPC-C specification shows, TPC-C requires 15% of (Payment?) transactions to do 2PC. (I knew this on Microsoft, CockroachDB, or Citus Data's site.)

--------------------------------------------------
Independent of the mode of selection, the customer resident
warehouse is the home warehouse 85% of the time and is a randomly selected remote warehouse 15% of the time.
This can be implemented by generating two random numbers x and y within [1 .. 100];

. If x <= 85 a customer is selected from the selected district number (C_D_ID = D_ID) and the home warehouse
number (C_W_ID = W_ID). The customer is paying through his/her own warehouse.

. If x > 85 a customer is selected from a random district number (C_D_ID is randomly selected within [1 .. 10]),
and a random remote warehouse number (C_W_ID is randomly selected within the range of active
warehouses (see Clause 4.2.2), and C_W_ID ≠ W_ID). The customer is paying through a warehouse and a
district other than his/her own.
--------------------------------------------------

Regards
Takayuki Tsunakawa

#128Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: tsunakawa.takay@fujitsu.com (#127)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/09/10 10:13, tsunakawa.takay@fujitsu.com wrote:

Alexey-san, Sawada-san,
cc: Fujii-san,

From: Fujii Masao <masao.fujii@oss.nttdata.com>

But if we
implement 2PC as the improvement on FDW independently from PostgreSQL
sharding, I think that it's necessary to support other FDW. And this is our
direction, isn't it?

I understand the same way as Fujii san. 2PC FDW is itself useful, so I think we should pursue the tidy FDW interface and good performance withinn the FDW framework. "tidy" means that many other FDWs should be able to implement it. I guess XA/JTA is the only material we can use to consider whether the FDW interface is good.

Originally start(), commit() and rollback() are supported as FDW interfaces. With his patch, prepare() is supported. What other interfaces need to be supported per XA/JTA?

As far as I and Sawada-san discussed this upthread, to support MySQL, another type of start() would be necessary to issue "XA START id" command. end() might be also necessary to issue "XA END id", but that command can be issued via prepare() together with "XA PREPARE id".

I'm not familiar with XA/JTA and XA transaction interfaces on other major DBMS. So I'd like to know what other interfaces are necessary additionally?

Sawada-san's patch supports that case by implememnting some conponents
for that also in PostgreSQL core. For example, with the patch, all the remote
transactions that participate at the transaction are managed by PostgreSQL
core instead of postgres_fdw layer.

Therefore, at least regarding the difference 2), I think that Sawada-san's
approach is better. Thought?

I think so. Sawada-san's patch needs to address the design issues I posed before digging into the code for thorough review, though.

BTW, is there something Sawada-san can take from Alexey-san's patch? I'm concerned about the performance for practical use. Do you two have differences in these points, for instance?

IMO Sawada-san's version of 2PC is less performant, but it's because
his patch provides more functionality. For example, with his patch,
WAL is written to automatically complete the unresolve foreign transactions
in the case of failure. OTOH, Alexey patch introduces no new WAL for 2PC.
Of course, generating more WAL would cause more overhead.
But if we need automatic resolution feature, it's inevitable to introduce
new WAL whichever the patch we choose.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#129Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#123)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Kapila <amit.kapila16@gmail.com>

I intend to say that the global-visibility work can impact this in a
major way and we have analyzed that to some extent during a discussion
on the other thread. So, I think without having a complete
design/solution that addresses both the 2PC and global-visibility, it
is not apparent what is the right way to proceed. It seems to me that
rather than working on individual (or smaller) parts one needs to come
up with a bigger picture (or overall design) and then once we have
figured that out correctly, it would be easier to decide which parts
can go first.

I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss the big picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holding me captive; I'm just late.) Please wait a few days.

But to proceed with the development, let me comment on the atomic commit and global visibility.

* We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we can avoid it.

* I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-node atomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available, and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shut down when the node's clock is distant from the other nodes.

* Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to be transferred among the cluster nodes. However, this seems to have to track the order of read and write operations among concurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCO paper seems to present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybody kindly interpret this?

Commitment ordering (CO) - yoavraz2
https://sites.google.com/site/yoavraz2/the_principle_of_co

As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to be addressed:

1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdw and jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA feature as SQL statements, not C functions as defined in the XA specification.

I agree that we need to verify new FDW APIs will be suitable for other
FDWs than postgres_fdw as well.

2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Each backend should perform 2PC.

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.
But in the future, I think we can have multiple background workers per
database for better performance.

3. postgres_fdw cannot detect remote updates when the UDF executed on a remote node updates data.

I assume that you mean the pushing the UDF down to a foreign server.
If so, I think we can do this by improving postgres_fdw. In the
current patch, registering and unregistering a foreign server to a
group of 2PC and marking a foreign server as updated is FDW
responsible. So perhaps if we had a way to tell postgres_fdw that the
UDF might update the data on the foreign server, postgres_fdw could
mark the foreign server as updated if the UDF is shippable.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#130Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#129)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2020/09/11 0:37, Masahiko Sawada wrote:

On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Kapila <amit.kapila16@gmail.com>

I intend to say that the global-visibility work can impact this in a
major way and we have analyzed that to some extent during a discussion
on the other thread. So, I think without having a complete
design/solution that addresses both the 2PC and global-visibility, it
is not apparent what is the right way to proceed. It seems to me that
rather than working on individual (or smaller) parts one needs to come
up with a bigger picture (or overall design) and then once we have
figured that out correctly, it would be easier to decide which parts
can go first.

I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss the big picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holding me captive; I'm just late.) Please wait a few days.

But to proceed with the development, let me comment on the atomic commit and global visibility.

* We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we can avoid it.

* I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-node atomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available, and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shut down when the node's clock is distant from the other nodes.

* Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to be transferred among the cluster nodes. However, this seems to have to track the order of read and write operations among concurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCO paper seems to present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybody kindly interpret this?

Commitment ordering (CO) - yoavraz2
https://sites.google.com/site/yoavraz2/the_principle_of_co

As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to be addressed:

1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdw and jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA feature as SQL statements, not C functions as defined in the XA specification.

I agree that we need to verify new FDW APIs will be suitable for other
FDWs than postgres_fdw as well.

2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Each backend should perform 2PC.

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.

Can I check my understanding about why the resolver process is necessary?

Firstly, you think that issuing COMMIT PREPARED command to the foreign server can cause an error, for example, because of connection error, OOM, etc. On the other hand, only waiting for other process to issue the command is less likely to cause an error. Right?

If an error occurs in backend process after commit record is WAL-logged, the error would be reported to the client and it may misunderstand that the transaction failed even though commit record was already flushed. So you think that each backend should not issue COMMIT PREPARED command to avoid that inconsistency. To avoid that, it's better to make other process, the resolver, issue the command and just make each backend wait for that to completed. Right?

Also using the resolver process has another merit; when there are unresolved foreign transactions but the corresponding backend exits, the resolver can try to resolve them. If something like this automatic resolution is necessary, the process like the resolver would be necessary. Right?

To the contrary, if we don't need such automatic resolution (i.e., unresolved foreign transactions always need to be resolved manually) and we can prevent the code to issue COMMIT PREPARED command from causing an error (not sure if that's possible, though...), probably we don't need the resolver process. Right?

But in the future, I think we can have multiple background workers per
database for better performance.

Yes, that's an idea.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#131tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Fujii Masao (#128)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Fujii Masao <masao.fujii@oss.nttdata.com>

Originally start(), commit() and rollback() are supported as FDW interfaces.
As far as I and Sawada-san discussed this upthread, to support MySQL,
another type of start() would be necessary to issue "XA START id" command.
end() might be also necessary to issue "XA END id", but that command can be
issued via prepare() together with "XA PREPARE id".

Yeah, I think we can call xa_end and xa_prepare in the FDW's prepare function.

The issue is when to call xa_start, which requires XID as an argument. We don't want to call it in transactions that access only one node...?

With his patch, prepare() is supported. What other interfaces need to be
supported per XA/JTA?

I'm not familiar with XA/JTA and XA transaction interfaces on other major
DBMS. So I'd like to know what other interfaces are necessary additionally?

I think xa_start, xa_end, xa_prepare, xa_commit, xa_rollback, and xa_recover are sufficient. The XA specification is here:

https://pubs.opengroup.org/onlinepubs/009680699/toc.pdf

You can see the function reference in Chapter 5, and the concept in Chapter 3. Chapter 6 was probably showing the state transition (function call sequence.)

IMO Sawada-san's version of 2PC is less performant, but it's because his
patch provides more functionality. For example, with his patch, WAL is written
to automatically complete the unresolve foreign transactions in the case of
failure. OTOH, Alexey patch introduces no new WAL for 2PC.
Of course, generating more WAL would cause more overhead.
But if we need automatic resolution feature, it's inevitable to introduce new
WAL whichever the patch we choose.

Please do not get me wrong. I know Sawada-san is trying to ensure durability. I just wanted to know what each patch does in how much cost in terms of disk and network I/Os, and if one patch can take something from another for less cost. I'm simply guessing (without having read the code yet) that each transaction basically does:

- two round trips (prepare, commit) to each remote node
- two WAL writes (prepare, commit) on the local node and each remote node
- one write for two-phase state file on each remote node
- one write to record participants on the local node

It felt hard to think about the algorithm efficiency from the source code. As you may have seen, the DBMS textbook and/or papers describe disk and network I/Os to evaluate algorithms. I thought such information would be useful before going deeper into the source code. Maybe such things can be written in the following Sawada-san's wiki or README in the end.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Regards
Takayuki Tsunakawa

#132tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#129)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

2. 2PC processing is queued and serialized in one background worker. That

severely subdues transaction throughput. Each backend should perform
2PC.

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.

As Fujii-san is asking, I also would like to know what situation you think is not safe. Are you worried that the FDW's commit function might call ereport(ERROR | FATAL | PANIC)? If so, can't we stipulate that the FDW implementor should ensure that the commit function always returns control to the caller?

But in the future, I think we can have multiple background workers per
database for better performance.

Does the database in "per database" mean the local database (that applications connect to), or the remote database accessed via FDW?

I'm wondering how the FDW and background worker(s) can realize parallel prepare and parallel commit. That is, the coordinator transaction performs:

1. Issue prepare to all participant nodes, but doesn't wait for the reply for each issue.
2. Waits for replies from all participants.
3. Issue commit to all participant nodes, but doesn't wait for the reply for each issue.
4. Waits for replies from all participants.

If we just consider PostgreSQL and don't think about FDW, we can use libpq async functions -- PQsendQuery, PQconsumeInput, and PQgetResult. pgbench uses them so that one thread can issue SQL statements on multiple connections in parallel.

But when we consider the FDW interface, plus other DBMSs, how can we achieve the parallelism?

3. postgres_fdw cannot detect remote updates when the UDF executed on a

remote node updates data.

I assume that you mean the pushing the UDF down to a foreign server.
If so, I think we can do this by improving postgres_fdw. In the current patch,
registering and unregistering a foreign server to a group of 2PC and marking a
foreign server as updated is FDW responsible. So perhaps if we had a way to
tell postgres_fdw that the UDF might update the data on the foreign server,
postgres_fdw could mark the foreign server as updated if the UDF is shippable.

Maybe we can consider VOLATILE functions update data. That may be overreaction, though.

Another idea is to add a new value to the ReadyForQuery message in the FE/BE protocol. Say, 'U' if in a transaction block that updated data. Here we consider "updated" as having allocated an XID.

52.7. Message Formats
https://www.postgresql.org/docs/devel/protocol-message-formats.html
--------------------------------------------------
ReadyForQuery (B)

Byte1
Current backend transaction status indicator. Possible values are 'I' if idle (not in a transaction block); 'T' if in a transaction block; or 'E' if in a failed transaction block (queries will be rejected until block is ended).
--------------------------------------------------

Regards
Takayuki Tsunakawa

#133Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Fujii Masao (#130)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 11 Sep 2020 at 11:58, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/09/11 0:37, Masahiko Sawada wrote:

On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Kapila <amit.kapila16@gmail.com>

I intend to say that the global-visibility work can impact this in a
major way and we have analyzed that to some extent during a discussion
on the other thread. So, I think without having a complete
design/solution that addresses both the 2PC and global-visibility, it
is not apparent what is the right way to proceed. It seems to me that
rather than working on individual (or smaller) parts one needs to come
up with a bigger picture (or overall design) and then once we have
figured that out correctly, it would be easier to decide which parts
can go first.

I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss the big picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holding me captive; I'm just late.) Please wait a few days.

But to proceed with the development, let me comment on the atomic commit and global visibility.

* We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we can avoid it.

* I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-node atomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available, and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shut down when the node's clock is distant from the other nodes.

* Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to be transferred among the cluster nodes. However, this seems to have to track the order of read and write operations among concurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCO paper seems to present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybody kindly interpret this?

Commitment ordering (CO) - yoavraz2
https://sites.google.com/site/yoavraz2/the_principle_of_co

As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to be addressed:

1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdw and jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA feature as SQL statements, not C functions as defined in the XA specification.

I agree that we need to verify new FDW APIs will be suitable for other
FDWs than postgres_fdw as well.

2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Each backend should perform 2PC.

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.

Can I check my understanding about why the resolver process is necessary?

Firstly, you think that issuing COMMIT PREPARED command to the foreign server can cause an error, for example, because of connection error, OOM, etc. On the other hand, only waiting for other process to issue the command is less likely to cause an error. Right?

If an error occurs in backend process after commit record is WAL-logged, the error would be reported to the client and it may misunderstand that the transaction failed even though commit record was already flushed. So you think that each backend should not issue COMMIT PREPARED command to avoid that inconsistency. To avoid that, it's better to make other process, the resolver, issue the command and just make each backend wait for that to completed. Right?

Also using the resolver process has another merit; when there are unresolved foreign transactions but the corresponding backend exits, the resolver can try to resolve them. If something like this automatic resolution is necessary, the process like the resolver would be necessary. Right?

To the contrary, if we don't need such automatic resolution (i.e., unresolved foreign transactions always need to be resolved manually) and we can prevent the code to issue COMMIT PREPARED command from causing an error (not sure if that's possible, though...), probably we don't need the resolver process. Right?

Yes, I'm on the same page about all the above explanations.

The resolver process has two functionalities: resolving foreign
transactions automatically when the user issues COMMIT (the case you
described in the second paragraph), and resolving foreign transaction
when the corresponding backend no longer exist or when the server
crashes during in the middle of 2PC (described in the third
paragraph).

Considering the design without the resolver process, I think we can
easily replace the latter with the manual resolution. OTOH, it's not
easy for the former. I have no idea about better design for now,
although, as you described, if we could ensure that the process
doesn't raise an error during resolving foreign transactions after
committing the local transaction we would not need the resolver
process.

Or the second idea would be that the backend commits only the local
transaction then returns the acknowledgment of COMMIT to the user
without resolving foreign transactions. Then the user manually
resolves the foreign transactions by, for example, using the SQL
function pg_resolve_foreign_xact() within a separate transaction. That
way, even if an error occurred during resolving foreign transactions
(i.g., executing COMMIT PREPARED), it’s okay as the user is already
aware of the local transaction having been committed and can retry to
resolve the unresolved foreign transaction. So we won't need the
resolver process while avoiding such inconsistency.

But a drawback would be that the transaction commit doesn't ensure
that all foreign transactions are completed. The subsequent
transactions would need to check if the previous distributed
transaction is completed to see its results. I’m not sure it’s a good
design in terms of usability.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#134Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#132)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 11 Sep 2020 at 18:24, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

2. 2PC processing is queued and serialized in one background worker. That

severely subdues transaction throughput. Each backend should perform
2PC.

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.

As Fujii-san is asking, I also would like to know what situation you think is not safe. Are you worried that the FDW's commit function might call ereport(ERROR | FATAL | PANIC)?

Yes.

If so, can't we stipulate that the FDW implementor should ensure that the commit function always returns control to the caller?

How can the FDW implementor ensure that? Since even palloc could call
ereport(ERROR) I guess it's hard to require that to all FDW
implementors.

But in the future, I think we can have multiple background workers per
database for better performance.

Does the database in "per database" mean the local database (that applications connect to), or the remote database accessed via FDW?

I meant the local database. In the current patch, we launch the
resolver process per local database. My idea is to allow launching
multiple resolver processes for one local database as long as the
number of workers doesn't exceed the limit.

I'm wondering how the FDW and background worker(s) can realize parallel prepare and parallel commit. That is, the coordinator transaction performs:

1. Issue prepare to all participant nodes, but doesn't wait for the reply for each issue.
2. Waits for replies from all participants.
3. Issue commit to all participant nodes, but doesn't wait for the reply for each issue.
4. Waits for replies from all participants.

If we just consider PostgreSQL and don't think about FDW, we can use libpq async functions -- PQsendQuery, PQconsumeInput, and PQgetResult. pgbench uses them so that one thread can issue SQL statements on multiple connections in parallel.

But when we consider the FDW interface, plus other DBMSs, how can we achieve the parallelism?

It's still a rough idea but I think we can use TMASYNC flag and
xa_complete explained in the XA specification. The core transaction
manager call prepare, commit, rollback APIs with the flag, requiring
to execute the operation asynchronously and to return a handler (e.g.,
a socket taken by PQsocket in postgres_fdw case) to the transaction
manager. Then the transaction manager continues polling the handler
until it becomes readable and testing the completion using by
xa_complete() with no wait, until all foreign servers return OK on
xa_complete check.

3. postgres_fdw cannot detect remote updates when the UDF executed on a

remote node updates data.

I assume that you mean the pushing the UDF down to a foreign server.
If so, I think we can do this by improving postgres_fdw. In the current patch,
registering and unregistering a foreign server to a group of 2PC and marking a
foreign server as updated is FDW responsible. So perhaps if we had a way to
tell postgres_fdw that the UDF might update the data on the foreign server,
postgres_fdw could mark the foreign server as updated if the UDF is shippable.

Maybe we can consider VOLATILE functions update data. That may be overreaction, though.

Sorry I don't understand that. The volatile functions are not pushed
down to the foreign servers in the first place, no?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#135Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Masahiko Sawada (#133)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Sep 11, 2020 at 4:37 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

Considering the design without the resolver process, I think we can
easily replace the latter with the manual resolution. OTOH, it's not
easy for the former. I have no idea about better design for now,
although, as you described, if we could ensure that the process
doesn't raise an error during resolving foreign transactions after
committing the local transaction we would not need the resolver
process.

My initial patch used the same backend to resolve foreign
transactions. But in that case even though the user receives COMMIT
completed, the backend isn't accepting the next query till it is busy
resolving the foreign server. That might be a usability issue again if
attempting to resolve all foreign transactions takes noticeable time.
If we go this route, we should try to resolve as many foreign
transactions as possible ignoring any errors while doing so and
somehow let user know which transactions couldn't be resolved. User
can then take responsibility for resolving those.

Or the second idea would be that the backend commits only the local
transaction then returns the acknowledgment of COMMIT to the user
without resolving foreign transactions. Then the user manually
resolves the foreign transactions by, for example, using the SQL
function pg_resolve_foreign_xact() within a separate transaction. That
way, even if an error occurred during resolving foreign transactions
(i.g., executing COMMIT PREPARED), it’s okay as the user is already
aware of the local transaction having been committed and can retry to
resolve the unresolved foreign transaction. So we won't need the
resolver process while avoiding such inconsistency.

But a drawback would be that the transaction commit doesn't ensure
that all foreign transactions are completed. The subsequent
transactions would need to check if the previous distributed
transaction is completed to see its results. I’m not sure it’s a good
design in terms of usability.

I agree, this won't be acceptable.

In either case, I think a solution where the local server takes
responsibility to resolve foreign transactions will be better even in
the first cut.

--
Best Wishes,
Ashutosh Bapat

#136tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#134)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

If so, can't we stipulate that the FDW implementor should ensure that the

commit function always returns control to the caller?

How can the FDW implementor ensure that? Since even palloc could call
ereport(ERROR) I guess it's hard to require that to all FDW
implementors.

I think the what FDW commit routine will do is to just call xa_commit(), or PQexec("COMMIT PREPARED") in postgres_fdw.

It's still a rough idea but I think we can use TMASYNC flag and
xa_complete explained in the XA specification. The core transaction
manager call prepare, commit, rollback APIs with the flag, requiring
to execute the operation asynchronously and to return a handler (e.g.,
a socket taken by PQsocket in postgres_fdw case) to the transaction
manager. Then the transaction manager continues polling the handler
until it becomes readable and testing the completion using by
xa_complete() with no wait, until all foreign servers return OK on
xa_complete check.

Unfortunately, even Oracle and Db2 don't support XA asynchronous execution for years. Our DBMS Symfoware doesn't, either. I don't expect other DBMSs support it.

Hmm, I'm afraid this may be one of the FDW's intractable walls for a serious scale-out DBMS. If we define asynchronous FDW routines for 2PC, postgres_fdw would be able to implement them by using libpq asynchronous functions. But other DBMSs can't ...

Maybe we can consider VOLATILE functions update data. That may be

overreaction, though.

Sorry I don't understand that. The volatile functions are not pushed
down to the foreign servers in the first place, no?

Ah, you're right. Then, the choices are twofold: (1) trust users in that their functions don't update data or the user's claim (specification) about it, and (2) get notification through FE/BE protocol that the remote transaction may have updated data.

Regards
Takayuki Tsunakawa

#137tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#133)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

The resolver process has two functionalities: resolving foreign
transactions automatically when the user issues COMMIT (the case you
described in the second paragraph), and resolving foreign transaction
when the corresponding backend no longer exist or when the server
crashes during in the middle of 2PC (described in the third
paragraph).

Considering the design without the resolver process, I think we can
easily replace the latter with the manual resolution. OTOH, it's not
easy for the former. I have no idea about better design for now,
although, as you described, if we could ensure that the process
doesn't raise an error during resolving foreign transactions after
committing the local transaction we would not need the resolver
process.

Yeah, the resolver background process -- someone independent of client sessions -- is necessary, because the client session disappears sometime. When the server that hosts the 2PC coordinator crashes, there are no client sessions. Our DBMS Symfoware also runs background threads that take care of resolution of in-doubt transactions due to a server or network failure.

Then, how does the resolver get involved in 2PC to enable parallel 2PC? Two ideas quickly come to mind:

(1) Each client backend issues prepare and commit to multiple remote nodes asynchronously.
If the communication fails during commit, the client backend leaves the commit notification task to the resolver.
That is, the resolver lends a hand during failure recovery, and doesn't interfere with the transaction processing during normal operation.

(2) The resolver takes some responsibility in 2PC processing during normal operation.
(send prepare and/or commit to remote nodes and get the results.)
To avoid serial execution per transaction, the resolver bundles multiple requests, send them in bulk, and wait for multiple replies at once.
This allows the coordinator to do its own prepare processing in parallel with those of participants.
However, in Postgres, this requires context switches between the client backend and the resolver.

Our Symfoware takes (2). However, it doesn't suffer from the context switch, because the server is multi-threaded and further implements or uses more lightweight entities than the thread.

Or the second idea would be that the backend commits only the local
transaction then returns the acknowledgment of COMMIT to the user
without resolving foreign transactions. Then the user manually
resolves the foreign transactions by, for example, using the SQL
function pg_resolve_foreign_xact() within a separate transaction. That
way, even if an error occurred during resolving foreign transactions
(i.g., executing COMMIT PREPARED), it’s okay as the user is already
aware of the local transaction having been committed and can retry to
resolve the unresolved foreign transaction. So we won't need the
resolver process while avoiding such inconsistency.

But a drawback would be that the transaction commit doesn't ensure
that all foreign transactions are completed. The subsequent
transactions would need to check if the previous distributed
transaction is completed to see its results. I’m not sure it’s a good
design in terms of usability.

I don't think it's a good design as you are worried. I guess that's why Postgres-XL had to create a tool called pgxc_clean and ask the user to resolve transactions with it.

pgxc_clean
https://www.postgres-xl.org/documentation/pgxcclean.html

"pgxc_clean is a Postgres-XL utility to maintain transaction status after a crash. When a Postgres-XL node crashes and recovers or fails over, the commit status of the node may be inconsistent with other nodes. pgxc_clean checks transaction commit status and corrects them."

Regards
Takayuki Tsunakawa

#138Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#113)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote:

Thank you for letting me know. I've attached the latest version patch set.

A rebase is needed again as the CF bot is complaining.
--
Michael

#139Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Michael Paquier (#138)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 17 Sep 2020 at 14:25, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote:

Thank you for letting me know. I've attached the latest version patch set.

A rebase is needed again as the CF bot is complaining.

Thank you for letting me know. I'm updating the patch and splitting
into small pieces as Fujii-san suggested. I'll submit the latest patch
set early next week.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#140Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#136)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 16 Sep 2020 at 13:20, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

If so, can't we stipulate that the FDW implementor should ensure that the

commit function always returns control to the caller?

How can the FDW implementor ensure that? Since even palloc could call
ereport(ERROR) I guess it's hard to require that to all FDW
implementors.

I think the what FDW commit routine will do is to just call xa_commit(), or PQexec("COMMIT PREPARED") in postgres_fdw.

Yes, but it still seems hard to me that we require for all FDW
implementations to commit/rollback prepared transactions without the
possibility of ERROR.

It's still a rough idea but I think we can use TMASYNC flag and
xa_complete explained in the XA specification. The core transaction
manager call prepare, commit, rollback APIs with the flag, requiring
to execute the operation asynchronously and to return a handler (e.g.,
a socket taken by PQsocket in postgres_fdw case) to the transaction
manager. Then the transaction manager continues polling the handler
until it becomes readable and testing the completion using by
xa_complete() with no wait, until all foreign servers return OK on
xa_complete check.

Unfortunately, even Oracle and Db2 don't support XA asynchronous execution for years. Our DBMS Symfoware doesn't, either. I don't expect other DBMSs support it.

Hmm, I'm afraid this may be one of the FDW's intractable walls for a serious scale-out DBMS. If we define asynchronous FDW routines for 2PC, postgres_fdw would be able to implement them by using libpq asynchronous functions. But other DBMSs can't ...

I think it's not necessarily that all FDW implementations need to be
able to support xa_complete(). We can support both synchronous and
asynchronous executions of prepare/commit/rollback.

Maybe we can consider VOLATILE functions update data. That may be

overreaction, though.

Sorry I don't understand that. The volatile functions are not pushed
down to the foreign servers in the first place, no?

Ah, you're right. Then, the choices are twofold: (1) trust users in that their functions don't update data or the user's claim (specification) about it, and (2) get notification through FE/BE protocol that the remote transaction may have updated data.

I'm confused about the point you're concerned about the UDF function.
If you're concerned that executing a UDF function by like 'SELECT
myfunc();' updates data on a foreign server, since the UDF should know
which foreign server it modifies data on it should be able to register
the foreign server and mark as modified. Or you’re concerned that a
UDF function in WHERE condition is pushed down and updates data (e.g.,
‘SELECT … FROM foreign_tbl WHERE id = myfunc()’)?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#141tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#140)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

Yes, but it still seems hard to me that we require for all FDW
implementations to commit/rollback prepared transactions without the
possibility of ERROR.

Of course we can't eliminate the possibility of error, because remote servers require network communication. What I'm saying is to just require the FDW to return error like xa_commit(), not throwing control away with ereport(ERROR). I don't think it's too strict.

I think it's not necessarily that all FDW implementations need to be
able to support xa_complete(). We can support both synchronous and
asynchronous executions of prepare/commit/rollback.

Yes, I think parallel prepare and commit can be an option for FDW. But I don't think it's an option for a serious scale-out DBMS. If we want to use FDW as part of PostgreSQL's scale-out infrastructure, we should design (if not implemented in the first version) how the parallelism can be realized. That design is also necessary because it could affect the FDW API.

If you're concerned that executing a UDF function by like 'SELECT
myfunc();' updates data on a foreign server, since the UDF should know
which foreign server it modifies data on it should be able to register
the foreign server and mark as modified. Or you’re concerned that a
UDF function in WHERE condition is pushed down and updates data (e.g.,
‘SELECT … FROM foreign_tbl WHERE id = myfunc()’)?

What I had in mind is "SELECT myfunc(...) FROM mytable WHERE col = ...;" Does the UDF call get pushed down to the foreign server in this case? If not now, could it be pushed down in the future? If it could be, it's worth considering how to detect the remote update now.

Regards
Takayuki Tsunakawa

#142Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#141)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Sep 22, 2020 at 6:48 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

I think it's not necessarily that all FDW implementations need to be
able to support xa_complete(). We can support both synchronous and
asynchronous executions of prepare/commit/rollback.

Yes, I think parallel prepare and commit can be an option for FDW. But I don't think it's an option for a serious scale-out DBMS. If we want to use FDW as part of PostgreSQL's scale-out infrastructure, we should design (if not implemented in the first version) how the parallelism can be realized. That design is also necessary because it could affect the FDW API.

parallelism here has both pros and cons. If one of the servers errors
out while preparing for a transaction, there is no point in preparing
the transaction on other servers. In parallel execution we will
prepare on multiple servers before realising that one of them has
failed to do so. On the other hand preparing on multiple servers in
parallel provides a speed up.

But this can be an improvement on version 1. The current approach
doesn't render such an improvement impossible. So if that's something
hard to do, we should do that in the next version rather than
complicating this patch.

--
Best Wishes,
Ashutosh Bapat

#143tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Ashutosh Bapat (#142)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

parallelism here has both pros and cons. If one of the servers errors
out while preparing for a transaction, there is no point in preparing
the transaction on other servers. In parallel execution we will
prepare on multiple servers before realising that one of them has
failed to do so. On the other hand preparing on multiple servers in
parallel provides a speed up.

And pros are dominant in practice. If many transactions are erroring out (during prepare), the system is not functioning for the user. Such an application should be corrected before they are put into production.

But this can be an improvement on version 1. The current approach
doesn't render such an improvement impossible. So if that's something
hard to do, we should do that in the next version rather than
complicating this patch.

Could you share your idea on how the current approach could enable parallelism? This is an important point, because (1) the FDW may not lead us to a seriously competitive scale-out DBMS, and (2) a better FDW API and/or implementation could be considered for non-parallel interaction if we have the realization of parallelism in mind. I think that kind of consideration is the design (for the future).

Regards
Takayuki Tsunakawa

#144Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#143)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Sep 23, 2020 at 2:13 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

parallelism here has both pros and cons. If one of the servers errors
out while preparing for a transaction, there is no point in preparing
the transaction on other servers. In parallel execution we will
prepare on multiple servers before realising that one of them has
failed to do so. On the other hand preparing on multiple servers in
parallel provides a speed up.

And pros are dominant in practice. If many transactions are erroring out (during prepare), the system is not functioning for the user. Such an application should be corrected before they are put into production.

But this can be an improvement on version 1. The current approach
doesn't render such an improvement impossible. So if that's something
hard to do, we should do that in the next version rather than
complicating this patch.

Could you share your idea on how the current approach could enable parallelism? This is an important point, because (1) the FDW may not lead us to a seriously competitive scale-out DBMS, and (2) a better FDW API and/or implementation could be considered for non-parallel interaction if we have the realization of parallelism in mind. I think that kind of consideration is the design (for the future).

The way I am looking at is to put the parallelism in the resolution
worker and not in the FDW. If we use multiple resolution workers, they
can fire commit/abort on multiple foreign servers at a time.

But if we want parallelism within a single resolution worker, we will
need a separate FDW APIs for firing asynchronous commit/abort prepared
txn and fetching their results resp. But given the variety of FDWs,
not all of them will support asynchronous API, so we have to support
synchronous API anyway, which is what can be targeted in the first
version.

Thinking more about it, the core may support an API which accepts a
list of prepared transactions, their foreign servers and user mappings
and let FDW resolve all those either in parallel or one by one. So
parallelism is responsibility of FDW and not the core. But then we
loose parallelism across FDWs, which may not be a common case.

Given the complications around this, I think we should go ahead
supporting synchronous API first and in second version introduce
optional asynchronous API.

--
Best Wishes,
Ashutosh Bapat

#145Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#141)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 22 Sep 2020 at 10:17, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

Yes, but it still seems hard to me that we require for all FDW
implementations to commit/rollback prepared transactions without the
possibility of ERROR.

Of course we can't eliminate the possibility of error, because remote servers require network communication. What I'm saying is to just require the FDW to return error like xa_commit(), not throwing control away with ereport(ERROR). I don't think it's too strict.

So with your idea, I think we require FDW developers to not call
ereport(ERROR) as much as possible. If they need to use a function
including palloc, lappend etc that could call ereport(ERROR), they
need to use PG_TRY() and PG_CATCH() and return the control along with
the error message to the transaction manager rather than raising an
error. Then the transaction manager will emit the error message at an
error level lower than ERROR (e.g., WARNING), and call commit/rollback
API again. But normally we do some cleanup on error but in this case
the retrying commit/rollback is performed without any cleanup. Is that
right? I’m not sure it’s safe though.

I think it's not necessarily that all FDW implementations need to be
able to support xa_complete(). We can support both synchronous and
asynchronous executions of prepare/commit/rollback.

Yes, I think parallel prepare and commit can be an option for FDW. But I don't think it's an option for a serious scale-out DBMS. If we want to use FDW as part of PostgreSQL's scale-out infrastructure, we should design (if not implemented in the first version) how the parallelism can be realized. That design is also necessary because it could affect the FDW API.

If you're concerned that executing a UDF function by like 'SELECT
myfunc();' updates data on a foreign server, since the UDF should know
which foreign server it modifies data on it should be able to register
the foreign server and mark as modified. Or you’re concerned that a
UDF function in WHERE condition is pushed down and updates data (e.g.,
‘SELECT … FROM foreign_tbl WHERE id = myfunc()’)?

What I had in mind is "SELECT myfunc(...) FROM mytable WHERE col = ...;" Does the UDF call get pushed down to the foreign server in this case? If not now, could it be pushed down in the future? If it could be, it's worth considering how to detect the remote update now.

IIUC aggregation functions can be pushed down to the foreign server
but I have not idea the normal UDF in the select list is pushed down.
I wonder if it isn't.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#146tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#145)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So with your idea, I think we require FDW developers to not call
ereport(ERROR) as much as possible. If they need to use a function
including palloc, lappend etc that could call ereport(ERROR), they
need to use PG_TRY() and PG_CATCH() and return the control along with
the error message to the transaction manager rather than raising an
error. Then the transaction manager will emit the error message at an
error level lower than ERROR (e.g., WARNING), and call commit/rollback
API again. But normally we do some cleanup on error but in this case
the retrying commit/rollback is performed without any cleanup. Is that
right? I’m not sure it’s safe though.

Yes. It's legitimate to require the FDW commit routine to return control, because the prepare of 2PC is a promise to commit successfully. The second-phase commit should avoid doing that could fail. For example, if some memory is needed for commit, it should be allocated in prepare or before.

IIUC aggregation functions can be pushed down to the foreign server
but I have not idea the normal UDF in the select list is pushed down.
I wonder if it isn't.

Oh, that's the current situation. Understood. I thought the UDF call is also pushed down, as I saw Greenplum does so. (Reading the manual, Greenplum disallows data updates in the UDF when it's executed on the remote segment server.)

(Aren't we overlooking something else that updates data on the remote server while the local server is unaware?)

Regards
Takayuki Tsunakawa

#147Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiko Sawada (#139)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 18 Sep 2020 at 17:00, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Thu, 17 Sep 2020 at 14:25, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote:

Thank you for letting me know. I've attached the latest version patch set.

A rebase is needed again as the CF bot is complaining.

Thank you for letting me know. I'm updating the patch and splitting
into small pieces as Fujii-san suggested. I'll submit the latest patch
set early next week.

I've rebased the patch set and split into small pieces. Here are short
descriptions of each change:

v26-0001-Recreate-RemoveForeignServerById.patch

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary because we need to check if there is a
foreign transaction involved with the foreign server that is about to
be removed.

v26-0002-Introduce-transaction-manager-for-foreign-transa.patch

This commit adds the basic foreign transaction manager,
CommitForeignTransaction, and RollbackForeignTransaction API. These
APIs support only one-phase. With this change, FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

v26-0003-postgres_fdw-supports-commit-and-rollback-APIs.patch

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported there is nothing the user
newly is able to do.

v26-0004-Add-PrepareForeignTransaction-API.patch

This commit adds prepared foreign transaction support including WAL
logging and recovery, and PrepareForeignTransaction API. With this
change, the user is able to do 'PREPARE TRANSACTION' and
'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves
foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the
local transaction. It doesn't do anything for foreign transactions.
Therefore, the user needs to resolve foreign transactions manually by
executing the pg_resolve_foreign_xacts() SQL function which is also
introduced by this commit.

v26-0005-postgres_fdw-supports-prepare-API-and-support-co.patch

This commit implements PrepareForeignTransaction API and makes
CommitForeignTransaction and RollbackForeignTransaction supports
two-phase commit.

v26-0006-Add-GetPrepareID-API.patch

This commit adds GetPrepareID API.

v26-0007-Automatic-foreign-transaciton-resolution-on-COMM.patch

This commit adds the automatic foreign transaction resolution on
COMMIT/ROLLBACK PREPARED by using foreign transaction resolver and
launcher processes. With this change, the user is able to
commit/rollback the distributed transaction by COMMIT/ROLLBACK
PREPARED without manual resolution. The involved foreign transactions
are automatically resolved by a resolver process.

v26-0008-Automatic-foreign-transaciton-resolution-on-comm.patch

This commit adds the automatic foreign transaction resolution on
commit/rollback. With this change, the user is able to commit the
foreign transactions automatically on commit without executing PREPARE
TRANSACTION when foreign_twophase_commit is 'required'. IOW, we can
guarantee that all foreign transactions had been resolved when the
user got an acknowledgment of COMMIT.

v26-0009-postgres_fdw-supports-automatically-resolution.patch

This commit makes postgres_fdw supports the 0008 change.

v26-0010-Documentation-update.patch
v26-0011-Add-regression-tests-for-foreign-twophase-commit.patch

The above commits are documentation update and regression tests.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v26-0008-Automatic-foreign-transaciton-resolution-on-comm.patchapplication/octet-stream; name=v26-0008-Automatic-foreign-transaciton-resolution-on-comm.patchDownload
From 26fd28171d89583d2103b57e883bd7d49f169c4a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 23 Sep 2020 16:16:36 +0900
Subject: [PATCH v26 08/11] Automatic foreign transaciton resolution on
 commit/rollback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c          | 244 +++++++++++++++++-
 src/backend/access/transam/xact.c             |  45 +++-
 src/backend/utils/misc/guc.c                  |  28 ++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  11 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 312 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index e3b5937054..8638a3cdf0 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -19,6 +19,32 @@
  * transaction, we collect the involved foreign transaction and wait for the resolver
  * process committing or rolling back the foreign transactions.
  *
+ * The global transaction manager support automatically foreign transaction
+ * resolution on commit and rollback.  The basic strategy is that we prepare all
+ * of the remote transactions before committing locally and commit them after
+ * committing locally.
+ *
+ * During pre-commit of local transaction, we prepare the transaction on
+ * all foreign servers.  And after committing or rolling back locally,
+ * we notify the resolver process and tell it to commit or rollback those
+ * transactions. If we ask to commit, we also tell to notify us when
+ * it's done, so that we can wait interruptibly to finish, and so that
+ * we're not trying to locally do work that might fail after foreign
+ * transaction are committed.
+ *
+ * The best performing way to manage the waiting backends is to have a
+ * queue of waiting backends, so that we can avoid searching the through all
+ * foreign transactions each time we receive a request.  We have one queue
+ * of which elements are ordered by the timestamp when they expect to be
+ * processed.  Before waiting for foreign transactions being resolved the
+ * backend enqueues with the timestamp when they expects to be processed.
+ * On failure, it enqueues again with new timestamp (last timestamp +
+ * foreign_xact_resolution_interval).
+ *
+ * If server crash occurs or user canceled waiting the prepared foreign
+ * transactions are left without a holder.	Such foreign transactions are
+ * resolved automatically by the resolver process.
+ *
  * Two-phase commit protocol is crash-safe.  We WAL logs the foreign transaction
  * information.
  *
@@ -96,6 +122,9 @@
 #include "utils/rel.h"
 #include "utils/ps_status.h"
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
@@ -139,6 +168,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -161,15 +193,25 @@ typedef struct FdwXactParticipant
  * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
  * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
  * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
+ *
  */
 static List *FdwXactParticipants = NIL;
 static List *FdwXactParticipants_tmp = NIL;
 
+/*
+ * FdwXactLocalXid is the local transaction id associated with FdwXactParticipants.
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed together with foreign servers.
+ */
+static TransactionId FdwXactLocalXid = InvalidTransactionId;
+static bool ForeignTwophaseCommitIsRequired = false;
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
-static void FdwXactPrepareForeignTransactions(void);
+static void FdwXactPrepareForeignTransactions(bool prepare_all);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
@@ -189,6 +231,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -269,7 +312,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -284,6 +327,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -303,6 +347,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -349,6 +394,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -358,23 +404,169 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 }
 
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' we ask all foreign servers
+ * to commit foreign transaction in one-phase. If we failed to commit any of
+ * them we change to aborting.
+ *
+ * Note that non-modified foreign servers always can be committed without
+ * preparation.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	ListCell   *lc;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * We need to use two-phase commit.	 Assign a transaction id to the
+		 * current transaction if not yet. Then prepare foreign transactions
+		 * on foreign servers that support two-phase commit.  Note that we
+		 * keep FdwXactParticipants until the end of the transaction.
+		 */
+		FdwXactLocalXid = xid;
+		if (!TransactionIdIsValid(FdwXactLocalXid))
+			FdwXactLocalXid = GetTopTransactionId();
+
+		FdwXactPrepareForeignTransactions(false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+	else
+	{
+		/*
+		 * Two-phase commit is not required. Commit foreign transactions in
+		 * the participant list.
+		 */
+		foreach(lc, FdwXactParticipants)
+		{
+			FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+			Assert(!fdw_part->fdwxact);
+
+			/* Commit the foreign transaction in one-phase */
+			if (ServerSupportTransactionCallback(fdw_part))
+				FdwXactParticipantEndTransaction(fdw_part, true);
+		}
+
+		/* All participants' transactions should be completed at this time */
+		ForgetAllFdwXactParticipants();
+	}
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(void)
+FdwXactPrepareForeignTransactions(bool prepare_all)
 {
 	ListCell   *lc;
 
 	if (FdwXactParticipants == NIL)
 		return;
 
+	Assert(TransactionIdIsValid(FdwXactLocalXid));
+
 	/* Loop over the foreign connections */
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
 		FdwXactRslvState state;
 		FdwXact		fdwxact;
-		TransactionId xid = GetTopTransactionId();
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -382,8 +574,11 @@ FdwXactPrepareForeignTransactions(void)
 		if (!ServerSupportTwophaseCommit(fdw_part))
 			continue;
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
-		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, FdwXactLocalXid);
 		Assert(fdw_part->fdwxact_id);
 
 		/*
@@ -401,7 +596,7 @@ FdwXactPrepareForeignTransactions(void)
 		 * server and will not be able to resolve it after the crash recovery.
 		 * Hence persist first then prepare.
 		 */
-		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+		fdwxact = FdwXactInsertFdwXactEntry(FdwXactLocalXid, fdw_part);
 
 		/*
 		 * Prepare the foreign transaction.
@@ -410,7 +605,7 @@ FdwXactPrepareForeignTransactions(void)
 		 * acknowledge from foreign server, the backend may abort the local
 		 * transaction (say, because of a signal).
 		 */
-		state.xid = xid;
+		state.xid = FdwXactLocalXid;
 		state.server = fdw_part->server;
 		state.usermapping = fdw_part->usermapping;
 		state.fdwxact_id = fdw_part->fdwxact_id;
@@ -739,8 +934,11 @@ PrePrepare_FdwXact(void)
 					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
 	}
 
+	/* Set the local transaction id */
+	FdwXactLocalXid = GetTopTransactionId();
+
 	/* Prepare transactions on participating foreign servers */
-	FdwXactPrepareForeignTransactions();
+	FdwXactPrepareForeignTransactions(true);
 
 	/*
 	 * We keep prepared foreign transaction participants to rollback them in
@@ -831,6 +1029,12 @@ SetFdwXactParticipants(TransactionId xid)
 	LWLockRelease(FdwXactLock);
 }
 
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
 void
 FdwXactCleanupAtProcExit(void)
 {
@@ -1165,6 +1369,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	Assert(ServerSupportTransactionCallback(fdw_part));
 
+	state.xid = FdwXactLocalXid;
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
 	state.fdwxact_id = NULL;
@@ -1201,6 +1406,7 @@ ForgetAllFdwXactParticipants(void)
 	if (FdwXactParticipants == NIL)
 	{
 		Assert(FdwXactParticipants_tmp == NIL);
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
 	}
 
@@ -1248,6 +1454,7 @@ ForgetAllFdwXactParticipants(void)
 	list_free_deep(FdwXactParticipants_tmp);
 	FdwXactParticipants = NIL;
 	FdwXactParticipants_tmp = NIL;
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -1257,6 +1464,7 @@ void
 AtEOXact_FdwXact(bool is_commit)
 {
 	ListCell   *lc;
+	bool		rollback_prepared = false;
 
 	/* If there are no foreign servers involved, we have no business here */
 	if (FdwXactParticipants == NIL)
@@ -1287,7 +1495,11 @@ AtEOXact_FdwXact(bool is_commit)
 
 		/*
 		 * Abort the foreign transaction.  For participants whose status is
-		 * FDWXACT_STATUS_PREPARING, we close the transaction in one-phase.
+		 * FDWXACT_STATUS_PREPARING, we close the transaction in one-phase. In
+		 * addition, since we are not sure that the preparation has been
+		 * completed on the foreign server, we also attempts to rollback the
+		 * prepared foreign transaction.  Note that it's FDWs responsibility
+		 * that they tolerate OBJECT_NOT_FOUND error in abort case.
 		 */
 		SpinLockAcquire(&(fdwxact->mutex));
 		status = fdwxact->status;
@@ -1296,6 +1508,18 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (status == FDWXACT_STATUS_PREPARING)
 			FdwXactParticipantEndTransaction(fdw_part, false);
+
+		rollback_prepared = true;
+	}
+
+	/*
+	 * Wait for all prepared or possibly-prepared foreign transactions to be
+	 * rolled back.
+	 */
+	if (rollback_prepared)
+	{
+		Assert(TransactionIdIsValid(FdwXactLocalXid));
+		FdwXactWaitForResolution(FdwXactLocalXid, false);
 	}
 
 	ForgetAllFdwXactParticipants();
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0dcc3182ec..cd11d58721 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1237,6 +1237,7 @@ RecordTransactionCommit(void)
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
 	bool		wrote_xlog;
+	bool		need_fdwxact_commit;
 
 	/*
 	 * Log pending invalidations for logical decoding of in-progress
@@ -1255,6 +1256,7 @@ RecordTransactionCommit(void)
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
 	wrote_xlog = (XactLastRecEnd != 0);
+	need_fdwxact_commit = FdwXactIsForeignTwophaseCommitRequired();
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -1293,12 +1295,13 @@ RecordTransactionCommit(void)
 		}
 
 		/*
-		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
+		 * If we didn't create XLOG entries and the transaction does not need
+		 * to be committed using two-phase commit. we're done here; otherwise
+		 * we should trigger flushing those entries the same as a commit record
 		 * would.  This will primarily happen for HOT pruning and the like; we
 		 * want these to be flushed to disk in due time.
 		 */
-		if (!wrote_xlog)
+		if (!wrote_xlog && !need_fdwxact_commit)
 			goto cleanup;
 	}
 	else
@@ -1445,16 +1448,37 @@ RecordTransactionCommit(void)
 	latestXid = TransactionIdLatest(xid, nchildren, children);
 
 	/*
-	 * Wait for synchronous replication, if required. Similar to the decision
-	 * above about using committing asynchronously we only want to wait if
-	 * this backend assigned an xid and wrote WAL.  No need to wait if an xid
-	 * was assigned due to temporary/unlogged tables or due to HOT pruning.
+	 * Wait for both synchronous replication and prepared foreign transaction
+	 * to be committed, if required.  We must wait for synchrnous replication
+	 * first because we need to make sure that the fate of the current
+	 * transaction is consistent between the primary and sync replicas before
+	 * resolving foreign transaction.  Otherwise, we will end up violating
+	 * atomic commit if a fail-over happens after some of foreign transactions
+	 * are committed.
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	if (wrote_xlog && markXidCommitted)
-		SyncRepWaitForLSN(XactLastRecEnd, true);
+	if (markXidCommitted)
+	{
+		bool canceled = false;
+
+		/*
+		 * Similar to the decision above about using committing asynchronously
+		 * we only want to wait if this backend assigned an xid, wrote WAL,
+		 * and not received a query cancel.  No need to wait if an xid was
+		 * assigned due to temporary/unlogged tables or due to HOT pruning.
+		 */
+		if (wrote_xlog)
+			canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+		/*
+		 * We only want to wait if we prepared foreign transactions in this
+		 * transaction and not received query cancel.
+		 */
+		if (!canceled && need_fdwxact_commit)
+			FdwXactWaitForResolution(xid, true);
+	}
 
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
@@ -2115,6 +2139,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b3960e9a1b..298ab461ed 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -483,6 +483,24 @@ const struct config_enum_entry ssl_protocol_versions_info[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 2),
 				 "array length mismatch");
 
@@ -4657,6 +4675,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2ed09cb347..5a73443be1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -744,6 +744,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 1ae70cbed6..13962b4156 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -25,6 +25,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -122,15 +130,18 @@ extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
 extern int	foreign_twophase_commit;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
+extern void PreCommit_FdwXact(void);
 extern void PrePrepare_FdwXact(void);
 extern void PostPrepare_FdwXact(void);
 extern bool CollectFdwXactParticipants(TransactionId xid);
 extern void SetFdwXactParticipants(TransactionId xid);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
 extern void FdwXactCleanupAtProcExit(void);
 extern void FdwXactWaitForResolution(TransactionId wait_xid, bool commit);
 extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 91db4f5bfc..7a444d0590 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -273,7 +273,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.23.0

v26-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v26-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 02472fed2baa5d24343865c43909bb7a5c025a8c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v26 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 186 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 160 ++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 104 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 520 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1245 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index a6d2ffbf9e..106f3b2ff2 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..431874323c
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,186 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ count 
+-------
+     0
+(1 row)
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..ba12ae6639
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,160 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+COMMIT PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+ROLLBACK PREPARED 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts();
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..525a6691e2
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,104 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the case where transaction attempting prepare the local transaction fails after
+# preparing foreign transactions. The first attempt should be succeeded, but the second
+# attempt will fail after preparing foreign transaction, and should rollback the prepared
+# foreign transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..93721e6038
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,520 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 state->xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 23d7d0beb2..d49a292cca 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2352,9 +2352,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2369,7 +2372,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 89e1b39036..addcd47575 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.23.0

v26-0009-postgres_fdw-supports-automatically-resolution.patchapplication/octet-stream; name=v26-0009-postgres_fdw-supports-automatically-resolution.patchDownload
From 2e5a5a5f0da0a020951c3dd7b5b55aae0ff78820 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 23 Sep 2020 16:17:11 +0900
Subject: [PATCH v26 09/11] postgres_fdw supports automatically resolution.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 17 ++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index fa8aa6d5df..f7951afe73 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -58,6 +58,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -202,6 +203,20 @@ GetConnectionCacheEntry(Oid umid)
 	return entry;
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
 
@@ -493,7 +508,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index caccad9846..b072cb42f3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2377,6 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3571,6 +3572,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 659222b97a..12cd55258f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.23.0

v26-0010-Documentation-update.patchapplication/octet-stream; name=v26-0010-Documentation-update.patchDownload
From e04bbfa7e2f8e3c97e0aca0ccf9064a9f99a7bf7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v26 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 162 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    |  91 ++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 836 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index de9bacd34f..85f0b6ac8e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9260,6 +9260,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11113,6 +11118,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>committing</literal> : This foreign transcation is being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>aborting</literal> : This foreign transaction is being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>in_doubt</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry></entry>
+      <entry>
+       If <literal>true</literal> this foreign transaction is in-doubt status.
+       A foreign transaction can have this status when the user has cancelled
+       the statement or the server crashes during transaction commit.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8eabf93834..201892f3e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9164,6 +9164,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..c83f8e9ee9
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,162 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit locally. The server commits transaction locally.  Any failure happens
+       in this step the server changes to rollback, then rollback all transactions
+       on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    Each commit of a distributed transaction will wait until confirmation is
+    received that all prepared transactions are committed or rolled back. The
+    guarantee we offeris that the application will not receive explicit
+    acknowledgement of the successful commit of a distributed transaction
+    until the all foreign transactions are resolved on the foreign servers.
+   </para>
+
+   <para>
+    When sychronous replication is also used, the distributed transaction
+    will wait for synchronous replication first, and then wait for foreign
+    transaction resolution.  This is necessary because the fate of local
+    transaction commit needs to be consistent among the primary and replicas.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    The atomic commit mechanism ensures that all foreign servers either commit
+    or rollback using two-phase commit protocol. However, foreign transactions
+    become <firstterm>in-doubt</firstterm> in two cases:
+
+    <itemizedlist>
+     <listitem>
+      <para>The local node crashed during either preparing or resolving foreign
+       transaction.</para>
+     </listitem>
+     <listitem>
+      <para>user canceled the query.</para>
+     </listitem>
+    </itemizedlist>
+
+    You can check in-doubt transaction in <xref linkend="view-pg-foreign-xacts"/>
+    view. These foreign transactions are resolved by foreign transaction resolver
+    process or executing <function>pg_resolve_foriegn_xact</function> function
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving both foreign transactions that are prepared by
+    online transactions and in-doubt transactions. They commit or rollback
+    prepared transactions on all foreign servers involved with the distributed
+    transaction if the local node received agreement messages from all
+    foreign servers during the first step of two-phase commit protocol.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127212..deeefc3dcf 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1424,6 +1424,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1903,4 +2014,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 828396d4a9..d619dfb82c 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -48,6 +48,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 461b748d89..438cc14da4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26200,6 +26200,97 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4e0193a967..3dfda93698 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1049,6 +1049,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1278,6 +1290,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1571,6 +1595,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1888,6 +1917,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index c41ce9499b..5ef1f4a329 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -170,6 +170,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.23.0

v26-0007-Automatic-foreign-transaciton-resolution-on-COMM.patchapplication/octet-stream; name=v26-0007-Automatic-foreign-transaciton-resolution-on-COMM.patchDownload
From b40910de70971d529c05bd11b0cf38d98bde43bc Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 22 Sep 2020 22:53:55 +0900
Subject: [PATCH v26 07/11] Automatic foreign transaciton resolution on
 COMMIT/ROLLBACK PREPARED.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/Makefile           |   4 +-
 src/backend/access/fdwxact/fdwxact.c          | 553 ++++++++++++++++-
 src/backend/access/fdwxact/launcher.c         | 556 ++++++++++++++++++
 src/backend/access/fdwxact/resolver.c         | 406 +++++++++++++
 src/backend/access/transam/twophase.c         |  50 +-
 src/backend/access/transam/xact.c             |   3 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   9 +
 src/backend/postmaster/postmaster.c           |  14 +-
 src/backend/replication/syncrep.c             |  15 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   2 +
 src/backend/storage/lmgr/proc.c               |   8 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |  26 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  63 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   3 +
 src/include/replication/syncrep.h             |   2 +-
 src/include/storage/proc.h                    |  12 +
 src/include/utils/guc_tables.h                |   2 +
 25 files changed, 1817 insertions(+), 41 deletions(-)
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
index aacab1d729..59e8d451b5 100644
--- a/src/backend/access/fdwxact/Makefile
+++ b/src/backend/access/fdwxact/Makefile
@@ -12,6 +12,8 @@ subdir = src/backend/access/fdwxact
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fdwxact.o
+OBJS = fdwxact.o \
+	resolver.o \
+	launcher.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index c6f6a92752..e3b5937054 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -15,8 +15,9 @@
  * To achieve commit among all foreign servers atomically, the global transaction
  * manager supports two-phase commit protocol, which is a type of atomic commitment
  * protocol(ACP).  Foreign servers whose FDW implements prepare API are prepared
- * when PREPARE TRANSACTION.  To commit or rollback prepared foreign transactions
- * we can use pg_resolve_foreign_xact() function.
+ * when PREPARE TRANSACTION.  On COMMIT PREPARED or ROLLBACK PREPARED the local
+ * transaction, we collect the involved foreign transaction and wait for the resolver
+ * process committing or rolling back the foreign transactions.
  *
  * Two-phase commit protocol is crash-safe.  We WAL logs the foreign transaction
  * information.
@@ -70,7 +71,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -83,11 +87,14 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "utils/ps_status.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
@@ -142,25 +149,35 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+
+ * FdwXactParticipants_tmp is used to update FdwXactParticipants atomically
+ * when executing COMMIT/ROLLBACK PREPARED command.	 In COMMIT PREPARED case,
+ * we don't want to rollback foreign transactions even if an error occurs,
+ * because the local prepared transaction never turn over rollback in that
+ * case.  However, preparing FdwXactParticipants might be lead an error
+ * because of calling palloc() inside.	So we prepare FdwXactParticipants in
+ * two phase.  In the first phase, CollectFdwXactParticipants(), we collect
+ * all foreign transactions associated with the local prepared transactions
+ * and kept them in FdwXactParticipants_tmp.  Even if an error occurs during
+ * that, we don't rollback them.  In the second phase, SetFdwXactParticipants(),
+ * we replace FdwXactParticipants_tmp with FdwXactParticipants and hold them.
  */
 static List *FdwXactParticipants = NIL;
-
-/* Keep track of registering process exit call back. */
-static bool fdwXactExitRegistered = false;
+static List *FdwXactParticipants_tmp = NIL;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
-static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(void);
-static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
 										 FdwXactParticipant *fdw_part);
-static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactQueueInsert(PGPROC *waiter);
+static void FdwXactCancelWait(void);
 static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
 static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
 static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
@@ -181,6 +198,10 @@ static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
 									TransactionId xid);
 static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 
+#ifdef USE_ASSERT_CHECKING
+static bool FdwXactQueueIsOrderedByTimestamp(void);
+#endif
+
 /*
  * Calculates the size of shared memory allocated for maintaining foreign
  * prepared transaction entries.
@@ -267,13 +288,6 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
-	/* on first call, register the exit hook */
-	if (!fdwXactExitRegistered)
-	{
-		before_shmem_exit(AtProcExit_FdwXact, 0);
-		fdwXactExitRegistered = true;
-	}
-
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/* Foreign server must implement both callback */
@@ -338,6 +352,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -427,6 +442,9 @@ FdwXactPrepareForeignTransactions(void)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
+	char	   *id;
+	int			id_len = 0;
+
 	/*
 	 * If FDW doesn't provide the callback function, generate an unique
 	 * identifier.
@@ -589,6 +607,7 @@ insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
 
 	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->proc = MyProc;
 	fdwxact->local_xid = xid;
 	fdwxact->dbid = dbid;
 	fdwxact->serverid = serverid;
@@ -645,6 +664,7 @@ remove_fdwxact(FdwXact fdwxact)
 
 	/* Reset informations */
 	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->proc = NULL;
 	fdwxact->locking_backend = InvalidBackendId;
 	fdwxact->valid = false;
 	fdwxact->ondisk = false;
@@ -722,16 +742,417 @@ PrePrepare_FdwXact(void)
 	/* Prepare transactions on participating foreign servers */
 	FdwXactPrepareForeignTransactions();
 
+	/*
+	 * We keep prepared foreign transaction participants to rollback them in
+	 * case of failure.
+	 */
+}
+
+/*
+ * After PREPARE TRANSACTION, we forget all participants.
+ */
+void
+PostPrepare_FdwXact(void)
+{
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * When the process exits, forget all the entries.
+ * Collect all foreign transactions associated with the given xid if it's a prepared
+ * transaction.	 Return true if COMMIT PREPARED or ROLLBACK PREPARED needs to wait for
+ * all foreign transactions to be resolved.	 The collected foreign transactions are
+ * kept in FdwXactParticipants_tmp. The caller must call SetFdwXactParticipants()
+ * later if this function returns true.
  */
-static void
-AtProcExit_FdwXact(int code, Datum arg)
+bool
+CollectFdwXactParticipants(TransactionId xid)
+{
+	MemoryContext old_ctx;
+
+	Assert(FdwXactParticipants_tmp == NIL);
+
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXactParticipant *fdw_part;
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwRoutine *routine;
+
+		if (!fdwxact->valid || fdwxact->local_xid != xid)
+			continue;
+
+		routine = GetFdwRoutineByServerId(fdwxact->serverid);
+		fdw_part = create_fdwxact_participant(fdwxact->serverid, fdwxact->userid,
+											  routine);
+		fdw_part->fdwxact = fdwxact;
+
+		/* Add to the participants list */
+		FdwXactParticipants_tmp = lappend(FdwXactParticipants_tmp, fdw_part);
+	}
+	LWLockRelease(FdwXactLock);
+
+	MemoryContextSwitchTo(old_ctx);
+
+	/* Return true if we collect at least one foreign transaction */
+	return (FdwXactParticipants_tmp != NIL);
+}
+
+/*
+ * Set the collected foreign transactions to the participants of this transaction,
+ * and hold them.  This function must be called after CollectFdwXactParticipants().
+ */
+void
+SetFdwXactParticipants(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants_tmp != NIL);
+	Assert(FdwXactParticipants == NIL);
+
+	FdwXactParticipants = FdwXactParticipants_tmp;
+	FdwXactParticipants_tmp = NIL;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+		Assert(fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED);
+		Assert(fdw_part->fdwxact->locking_backend == InvalidBackendId);
+		Assert(!fdw_part->fdwxact->proc);
+
+		/* Hold the fdwxact entry and set the status */
+		fdw_part->fdwxact->locking_backend = MyBackendId;
+		fdw_part->fdwxact->proc = MyProc;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+void
+FdwXactCleanupAtProcExit(void)
 {
 	ForgetAllFdwXactParticipants();
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+	{
+		LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+		LWLockRelease(FdwXactResolutionLock);
+	}
+}
+
+/*
+ * Wait for its all foreign transactions to be resolved.
+ *
+ * Initially backends start in state FDWXACT_NOT_WAITING and then change
+ * that state to FDWXACT_WAITING before adding ourselves to the wait queue.
+ * During FdwXactResolveForeignTransaction a fdwxact resolver changes the
+ * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved.
+ * This backend then resets its state to FDWXACT_NOT_WAITING.
+ * If a resolver fails to resolve the waiting transaction it moves us to
+ * the retry queue.
+ *
+ * This function is inspired by SyncRepWaitForLSN.
+ */
+void
+FdwXactWaitForResolution(TransactionId wait_xid, bool commit)
+{
+	ListCell   *lc;
+	char	   *new_status = NULL;
+	const char *old_status;
+
+	Assert(FdwXactCtl != NULL);
+	Assert(TransactionIdIsValid(wait_xid));
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING);
+
+	/* Quick exit if we don't have any participants */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Set foreign transaction status */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->fdwxact)
+			continue;
+
+		Assert(fdw_part->fdwxact->locking_backend == MyBackendId);
+		Assert(fdw_part->fdwxact->proc == MyProc);
+
+		SpinLockAcquire(&(fdw_part->fdwxact->mutex));
+		fdw_part->fdwxact->status = commit
+			? FDWXACT_STATUS_COMMITTING
+			: FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&(fdw_part->fdwxact->mutex));
+	}
+
+	/* Set backend status and enqueue itself to the active queue */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	MyProc->fdwXactState = FDWXACT_WAITING;
+	MyProc->fdwXactWaitXid = wait_xid;
+	MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp();
+	FdwXactQueueInsert(MyProc);
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+	LWLockRelease(FdwXactResolutionLock);
+
+	/* Launch a resolver process if not yet, or wake up */
+	FdwXactLaunchOrWakeupResolver();
+
+	/*
+	 * Alter ps display to show waiting for foreign transaction resolution.
+	 */
+	if (update_process_title)
+	{
+		int			len;
+
+		old_status = get_ps_display(&len);
+		new_status = (char *) palloc(len + 31 + 1);
+		memcpy(new_status, old_status, len);
+		sprintf(new_status + len, " waiting for resolution %d", wait_xid);
+		set_ps_display(new_status);
+		new_status[len] = '\0'; /* truncate off "waiting ..." */
+	}
+
+	/* Wait for all foreign transactions to be resolved */
+	for (;;)
+	{
+		/* Must reset the latch before testing state */
+		ResetLatch(MyLatch);
+
+		/*
+		 * Acquiring the lock is not needed, the latch ensures proper
+		 * barriers. If it looks like we're done, we must really be done,
+		 * because once resolver changes the state to FDWXACT_WAIT_COMPLETE,
+		 * it will never update it again, so we can't be seeing a stale value
+		 * in that case.
+		 */
+		if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE)
+		{
+			ForgetAllFdwXactParticipants();
+			break;
+		}
+
+		/*
+		 * If a wait for foreign transaction resolution is pending, we can
+		 * neither acknowledge the commit nor raise ERROR or FATAL.	 The
+		 * latter would lead the client to believe that the distributed
+		 * transaction aborted, which is not true: it's already committed
+		 * locally. The former is no good either: the client has requested
+		 * committing a distributed transaction, and is entitled to assume
+		 * that a acknowledged commit is also commit on all foreign servers,
+		 * which might not be true. So in this case we issue a WARNING (which
+		 * some clients may be able to interpret) and shut off further output.
+		 * We do NOT reset PorcDiePending, so that the process will die after
+		 * the commit is cleaned up.
+		 */
+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administrator command"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If a query cancel interrupt arrives we just terminate the wait with
+		 * a suitable warning. The foreign transactions can be orphaned but
+		 * the foreign xact resolver can pick up them and tries to resolve
+		 * them later.
+		 */
+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for resolving foreign transaction due to user request"),
+					 errdetail("The transaction has already committed locally, but might not have been committed on the foreign server.")));
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * If the postmaster dies, we'll probably never get an
+		 * acknowledgement, because all the resolver processes will exit. So
+		 * just bail out.
+		 */
+		if (!PostmasterIsAlive())
+		{
+			ProcDiePending = true;
+			whereToSendOutput = DestNone;
+			FdwXactCancelWait();
+			break;
+		}
+
+		/*
+		 * Wait on latch.  Any condition that should wake us up will set the
+		 * latch, so no need for timeout.
+		 */
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
+				  WAIT_EVENT_FDWXACT_RESOLUTION);
+	}
+
+	pg_read_barrier();
+	Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks)));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+
+	if (new_status)
+	{
+		set_ps_display(new_status);
+		pfree(new_status);
+	}
+}
+
+/*
+ * Return one backend that connects to my database and is waiting for
+ * resolution.
+ */
+PGPROC *
+FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+				 TransactionId *waitXid_p)
+{
+	PGPROC	   *proc;
+	bool		found = false;
+
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+	Assert(FdwXactQueueIsOrderedByTimestamp());
+
+	/* Initialize variables */
+	*nextResolutionTs_p = -1;
+	*waitXid_p = InvalidTransactionId;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == MyDatabaseId)
+		{
+			if (proc->fdwXactNextResolutionTs <= now)
+			{
+				/* Found a waiting process */
+				found = true;
+				*waitXid_p = proc->fdwXactWaitXid;
+			}
+			else
+				/* Found a waiting process supposed to be processed later */
+				*nextResolutionTs_p = proc->fdwXactNextResolutionTs;
+
+			break;
+		}
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return found ? proc : NULL;
+}
+
+/*
+ * Return true if there are at least one backend in the wait queue. The caller
+ * must hold FdwXactResolutionLock.
+ */
+bool
+FdwXactWaiterExists(Oid dbid)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED));
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->databaseId == dbid)
+			return true;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return false;
+}
+
+/*
+ * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order.
+ */
+static void
+FdwXactQueueInsert(PGPROC *waiter)
+{
+	PGPROC	   *proc;
+
+	Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE));
+
+	proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+
+	while (proc)
+	{
+		if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs)
+			break;
+
+		proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	if (proc)
+		SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks));
+	else
+		SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks));
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+FdwXactQueueIsOrderedByTimestamp(void)
+{
+	PGPROC	   *proc;
+	TimestampTz lastTs;
+
+	proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+								   &(FdwXactRslvCtl->fdwxact_queue),
+								   offsetof(PGPROC, fdwXactLinks));
+	lastTs = 0;
+
+	while (proc)
+	{
+
+		if (proc->fdwXactNextResolutionTs < lastTs)
+			return false;
+
+		lastTs = proc->fdwXactNextResolutionTs;
+
+		proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue),
+									   &(proc->fdwXactLinks),
+									   offsetof(PGPROC, fdwXactLinks));
+	}
+
+	return true;
+}
+#endif
+
+/*
+ * Acquire FdwXactResolutionLock and cancel any wait currently in progress.
+ */
+static void
+FdwXactCancelWait(void)
+{
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks)))
+		SHMQueueDelete(&(MyProc->fdwXactLinks));
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	LWLockRelease(FdwXactResolutionLock);
 }
 
 /*
@@ -771,14 +1192,17 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
  * transaction so that local transaction id of such unresolved foreign transaction
  * is not truncated.
  */
-static void
+void
 ForgetAllFdwXactParticipants(void)
 {
 	ListCell   *cell;
 	int			nlefts = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(FdwXactParticipants_tmp == NIL);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -802,7 +1226,9 @@ ForgetAllFdwXactParticipants(void)
 		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
 		if (fdwxact->valid)
 		{
-			fdwxact->locking_backend = InvalidBackendId;
+			if (fdwxact->locking_backend == MyBackendId)
+				fdwxact->locking_backend = InvalidBackendId;
+			fdwxact->proc = NULL;
 			nlefts++;
 		}
 		LWLockRelease(FdwXactLock);
@@ -819,7 +1245,9 @@ ForgetAllFdwXactParticipants(void)
 	}
 
 	list_free_deep(FdwXactParticipants);
+	list_free_deep(FdwXactParticipants_tmp);
 	FdwXactParticipants = NIL;
+	FdwXactParticipants_tmp = NIL;
 }
 
 /*
@@ -851,6 +1279,12 @@ AtEOXact_FdwXact(bool is_commit)
 			continue;
 		}
 
+		/*
+		 * We never reach here in commit case since all foreign transaction
+		 * should be committed in that case.
+		 */
+		Assert(!is_commit);
+
 		/*
 		 * Abort the foreign transaction.  For participants whose status is
 		 * FDWXACT_STATUS_PREPARING, we close the transaction in one-phase.
@@ -868,13 +1302,16 @@ AtEOXact_FdwXact(bool is_commit)
 }
 
 /*
- * Resolve foreign transactions at the give indexes.
+ * Resolve foreign transactions at the give indexes. If 'waiter' is not NULL,
+ * we release the waiter after we resolved all of the given foreign transactions
+ * Also on failure, we re-enqueue the waiting backend after incremented the next
+ * resolution time.
  *
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
-FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter)
 {
 	for (int i = 0; i < nfdwxacts; i++)
 	{
@@ -882,7 +1319,34 @@ FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 
 		CHECK_FOR_INTERRUPTS();
 
-		FdwXactResolveOneFdwXact(fdwxact);
+		PG_TRY();
+		{
+			FdwXactResolveOneFdwXact(fdwxact);
+		}
+		PG_CATCH();
+		{
+			/*
+			 * Failed to resolve. Re-insert the waiter to the tail of retry
+			 * queue if the waiter is still waiting.
+			 */
+			if (waiter)
+			{
+				LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+				if (waiter->fdwXactState == FDWXACT_WAITING)
+				{
+					SHMQueueDelete(&(waiter->fdwXactLinks));
+					pg_write_barrier();
+					waiter->fdwXactNextResolutionTs =
+						TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs,
+													foreign_xact_resolution_retry_interval);
+					FdwXactQueueInsert(waiter);
+				}
+				LWLockRelease(FdwXactResolutionLock);
+			}
+
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
 
 		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
 		if (fdwxact->ondisk)
@@ -891,6 +1355,38 @@ FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 		remove_fdwxact(fdwxact);
 		LWLockRelease(FdwXactLock);
 	}
+
+	if (!waiter)
+		return;
+
+	/*
+	 * We have resolved all foreign transactions.  Remove waiter from shmem
+	 * queue, if not detached yet. The waiter could already be detached if
+	 * user cancelled to wait before resolution.
+	 */
+	LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE);
+	if (!SHMQueueIsDetached(&(waiter->fdwXactLinks)))
+	{
+		TransactionId wait_xid = waiter->fdwXactWaitXid;
+
+		SHMQueueDelete(&(waiter->fdwXactLinks));
+		pg_write_barrier();
+
+		/* Set state to complete */
+		waiter->fdwXactState = FDWXACT_WAIT_COMPLETE;
+
+		/*
+		 * Wake up the waiter only when we have set state and removed from
+		 * queue
+		 */
+		SetLatch(&(waiter->procLatch));
+
+		elog(DEBUG2, "released the proc with xid %u", wait_xid);
+	}
+	else
+		elog(DEBUG2, "the waiter backend had been already detached");
+
+	LWLockRelease(FdwXactResolutionLock);
 }
 
 /*
@@ -1709,6 +2205,7 @@ RecoverFdwXacts(void)
 						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
 
 		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->proc = NULL;
 		fdwxact->inredo = false;
 		fdwxact->valid = true;
 		pfree(buf);
@@ -1732,7 +2229,7 @@ typedef struct
 Datum
 pg_foreign_xacts(PG_FUNCTION_ARGS)
 {
-#define PG_PREPARED_FDWXACTS_COLS	5
+#define PG_PREPARED_FDWXACTS_COLS	6
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -1804,8 +2301,8 @@ pg_foreign_xacts(PG_FUNCTION_ARGS)
 				break;
 		}
 		values[3] = CStringGetTextDatum(xact_status);
-		values[4] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
-															 strlen(fdwxact->fdwxact_id)));
+		values[4] = BoolGetDatum(fdwxact->proc == NULL);
+		values[5] = CStringGetTextDatum(fdwxact->fdwxact_id);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
@@ -1880,7 +2377,7 @@ pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
 
 	PG_TRY();
 	{
-		FdwXactResolveFdwXacts(&idx, 1);
+		FdwXactResolveFdwXacts(&idx, 1, NULL);
 	}
 	PG_CATCH();
 	{
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..d2ba6bd58c
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,556 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always starts when the
+		 * backend requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolvers are running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *resolver_dbs;	/* DBs resolver's running on */
+	HTAB	   *fdwxact_dbs;	/* DBs having at least one FdwXact entry */
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the process is waiting for
+		 * foreign transaction resolution.
+		 */
+		if (fdwxact->proc && fdwxact->proc->fdwXactState == FDWXACT_WAITING)
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no FdwXact entry, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Find DBs on which no resolvers are running and launch new one on them */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..3e93a5a84f
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,406 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_fdwxacts(PGPROC *waiter);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. We clear that flag from those entries on failure.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * There might be other waiting online transactions. So request to
+	 * re-launch.
+	 */
+		FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TransactionId waitXid = InvalidTransactionId;
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Process waiter until either the queue gets empty or the queue has
+		 * only waiters that have a future resolution timestamp.
+		 */
+		for (;;)
+		{
+			PGPROC	   *waiter;
+
+			CHECK_FOR_INTERRUPTS();
+
+			LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+
+			/* Get the waiter from the queue */
+			waiter = FdwXactGetWaiter(now, &resolutionTs, &waitXid);
+
+			if (!waiter)
+			{
+				/* Not found, break */
+				LWLockRelease(FdwXactResolutionLock);
+				break;
+			}
+
+			/* Hold the waiter's foreign transactions */
+			hold_fdwxacts(waiter);
+			Assert(nheld > 0);
+
+			LWLockRelease(FdwXactResolutionLock);
+
+			/*
+			 * Resolve the waiter's foreign transactions and release the
+			 * waiter.
+			 */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld, waiter);
+			CommitTransactionCommand();
+
+			last_resolution_time = now;
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	LWLockAcquire(FdwXactResolutionLock, LW_SHARED);
+	if (!FdwXactWaiterExists(MyDatabaseId))
+	{
+		/* There is no waiting backend */
+		StartTransactionCommand();
+		ereport(LOG,
+				(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+						get_database_name(MyDatabaseId))));
+		CommitTransactionCommand();
+
+		/*
+		 * Keep holding FdwXactResolutionLock until detached the slot. It is
+		 * necessary to prevent a race condition; a waiter enqueues after
+		 * FdwXactWaiterExists check.
+		 */
+		fdwxact_resolver_detach();
+		LWLockRelease(FdwXactResolutionLock);
+		proc_exit(0);
+	}
+	else
+		elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty");
+
+	LWLockRelease(FdwXactResolutionLock);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions associated with the given waiter's transaction
+ * as in-processing.  The caller must hold FdwXactResolutionLock so that
+ * the waiter doesn't change its state.
+ */
+static void
+hold_fdwxacts(PGPROC *waiter)
+{
+	Assert(LWLockHeldByMe(FdwXactResolutionLock));
+
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid && fdwxact->local_xid == waiter->fdwXactWaitXid)
+		{
+			Assert(fdwxact->proc->fdwXactState == FDWXACT_WAITING);
+			Assert(fdwxact->dbid == waiter->databaseId);
+
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 6f1f4a2da2..2571473729 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2216,6 +2217,14 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	XLogRecPtr	recptr;
 	TimestampTz committs = GetCurrentTimestamp();
 	bool		replorigin;
+	bool		need_fdwxact_commit;
+	bool		canceled = false;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = CollectFdwXactParticipants(xid);
 
 	/*
 	 * Are we using the replication origins feature?  Or, in other words, are
@@ -2280,12 +2289,25 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	END_CRIT_SECTION();
 
 	/*
-	 * Wait for synchronous replication, if required.
+	 * Wait for both synchronous replication and foreign transaction
+	 * resolution, if required
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	SyncRepWaitForLSN(recptr, true);
+	canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+	if (need_fdwxact_commit)
+	{
+		/* Set the collected foreign transaction participants */
+		SetFdwXactParticipants(xid);
+
+		if (!canceled)
+			FdwXactWaitForResolution(xid, true);
+
+		ForgetAllFdwXactParticipants();
+	}
+
 }
 
 /*
@@ -2305,6 +2327,14 @@ RecordTransactionAbortPrepared(TransactionId xid,
 							   const char *gid)
 {
 	XLogRecPtr	recptr;
+	bool		need_fdwxact_commit;
+	bool		canceled = false;
+
+	/*
+	 * Prepare foreign transactions involving this prepared transaction
+	 * if exist.
+	 */
+	need_fdwxact_commit = CollectFdwXactParticipants(xid);
 
 	/*
 	 * Catch the scenario where we aborted partway through
@@ -2339,12 +2369,24 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	END_CRIT_SECTION();
 
 	/*
-	 * Wait for synchronous replication, if required.
+	 * Wait for both synchronous replication and foreign transaction
+	 * resolution, if required
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	SyncRepWaitForLSN(recptr, false);
+	canceled = SyncRepWaitForLSN(XactLastRecEnd, true);
+
+	if (need_fdwxact_commit)
+	{
+		/* Set the collected foreign transaction participants */
+		SetFdwXactParticipants(xid);
+
+		if (!canceled)
+			FdwXactWaitForResolution(xid, false);
+
+		ForgetAllFdwXactParticipants();
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0a8d1da4bd..0dcc3182ec 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2566,6 +2566,9 @@ PrepareTransaction(void)
 	 */
 	PostPrepare_Twophase();
 
+	/* Release held FdwXact entries */
+	PostPrepare_FdwXact();
+
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..b2384f9ab9 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 6bf5a59b42..120cfa5773 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3663,6 +3663,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
@@ -3773,6 +3779,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLUTION:
+			event_name = "FdwXactResolution";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATE:
 			event_name = "HashBatchAllocate";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 81e6cb9ca2..67f3cf0e5e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #endif
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -926,6 +927,10 @@ PostmasterMain(int argc, char *argv[])
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
 
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
+
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
 	 * (Put any slow processing further down, after postmaster.pid creation.)
@@ -990,12 +995,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 6e8c76537a..a89b99225e 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -143,13 +143,17 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
  * represents a commit record.  If it doesn't, then we wait only for the WAL
  * to be flushed if synchronous_commit is set to the higher level of
  * remote_apply, because only commit records provide apply feedback.
+ *
+ * This function return true if the wait is cancelelled due to an
+ * interruption.
  */
-void
+bool
 SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 {
 	char	   *new_status = NULL;
 	const char *old_status;
 	int			mode;
+	bool		canceled = false;
 
 	/*
 	 * This should be called while holding interrupts during a transaction
@@ -174,7 +178,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 */
 	if (!SyncRepRequested() ||
 		!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
-		return;
+		return false;
 
 	/* Cap the level for anything other than commit to remote flush only. */
 	if (commit)
@@ -200,7 +204,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		lsn <= WalSndCtl->lsn[mode])
 	{
 		LWLockRelease(SyncRepLock);
-		return;
+		return false;
 	}
 
 	/*
@@ -270,6 +274,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -286,6 +291,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					(errmsg("canceling wait for synchronous replication due to user request"),
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -305,6 +311,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 			ProcDiePending = true;
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 	}
@@ -328,6 +335,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		set_ps_display(new_status);
 		pfree(new_status);
 	}
+
+	return canceled;
 }
 
 /*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d7191d3cd..271fd35884 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index dc29a7ea6f..a6d40446ce 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,5 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
+FdwXactResolutionLock				50
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 88566bd9fa..0b9d340c49 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -35,6 +35,7 @@
 #include <unistd.h>
 #include <sys/time.h>
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -417,6 +418,10 @@ InitProcess(void)
 	MyProc->syncRepState = SYNC_REP_NOT_WAITING;
 	SHMQueueElemInit(&(MyProc->syncRepLinks));
 
+	/* Initialize fields for fdwxact */
+	MyProc->fdwXactState = FDWXACT_NOT_WAITING;
+	SHMQueueElemInit(&(MyProc->fdwXactLinks));
+
 	/* Initialize fields for group XID clearing. */
 	MyProc->procArrayGroupMember = false;
 	MyProc->procArrayGroupMemberXid = InvalidTransactionId;
@@ -817,6 +822,9 @@ ProcKill(int code, Datum arg)
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
 
+	/* Make sure we're out of the fdwxact lists */
+	FdwXactCleanupAtProcExit();
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 411cfadbff..496e2b3a4a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b468c5628c..b3960e9a1b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -760,6 +760,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2469,6 +2473,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 863e8ccc3a..2ed09cb347 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -733,6 +733,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 6ba1a475fc..1ae70cbed6 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -16,6 +16,11 @@
 #include "storage/shmem.h"
 #include "storage/s_lock.h"
 
+/* fdwXactState */
+#define	FDWXACT_NOT_WAITING		0
+#define	FDWXACT_WAITING			1
+#define	FDWXACT_WAIT_COMPLETE	2
+
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
@@ -40,6 +45,13 @@ typedef struct FdwXactData
 
 	TransactionId local_xid;	/* XID of local transaction */
 
+	/*
+	 * A backend process that executed the distributed transaction. The owner
+	 * and a process locking this entry can be different during transaction
+	 * resolution as the resolver takes over the entry.
+	 */
+	PGPROC		*proc;			/* process that executed the distributed tx. */
+
 	/* Information relevant with foreign transaction */
 	Oid			dbid;
 	Oid			serverid;
@@ -106,12 +118,26 @@ typedef struct FdwXactRslvState
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern void PostPrepare_FdwXact(void);
+extern bool CollectFdwXactParticipants(TransactionId xid);
+extern void SetFdwXactParticipants(TransactionId xid);
+extern void FdwXactCleanupAtProcExit(void);
+extern void FdwXactWaitForResolution(TransactionId wait_xid, bool commit);
+extern PGPROC *FdwXactGetWaiter(TimestampTz now, TimestampTz *nextResolutionTs_p,
+								TransactionId *waitXid_p);
+extern bool FdwXactWaiterExists(Oid dbid);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts, PGPROC *waiter);
+extern void ForgetAllFdwXactParticipants(void);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 52f71ccd17..37d12adda2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6136,6 +6136,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d43dcce56f..7f3ed0bc71 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -806,6 +806,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
@@ -853,6 +855,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_FDWXACT_RESOLUTION,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
 	WAIT_EVENT_HASH_BATCH_LOAD,
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 9d286b66c6..cffab9c721 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -82,7 +82,7 @@ extern char *syncrep_parse_error_msg;
 extern char *SyncRepStandbyNames;
 
 /* called by user backend */
-extern void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
+extern bool SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
 
 /* called at backend exit */
 extern void SyncRepCleanupAtProcExit(void);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 9c9a50ae45..06c9f4095f 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/xlogdefs.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
@@ -188,6 +189,17 @@ struct PGPROC
 	int			syncRepState;	/* wait state for sync rep */
 	SHM_QUEUE	syncRepLinks;	/* list link if process is in syncrep queue */
 
+	/*
+	 * Info to allow us to wait for foreign transaction to be resolved, if
+	 * needed.
+	 */
+	TransactionId	fdwXactWaitXid;	/* waiting for foreign transaction involved with
+									 * this transaction id to be resolved */
+	int				fdwXactState;	/* wait state for foreign transaction
+									 * resolution */
+	SHM_QUEUE	fdwXactLinks;	/* list link if process is in queue */
+	TimestampTz fdwXactNextResolutionTs;
+
 	/*
 	 * All PROCLOCK objects for locks held or awaited by this backend are
 	 * linked into one of these lists, according to the partition number of
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 04431d0eb2..a00ca73355 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.23.0

v26-0006-Add-GetPrepareID-API.patchapplication/octet-stream; name=v26-0006-Add-GetPrepareID-API.patchDownload
From b711634f5d109b51e7deca7905bd297f35bc04ce Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v26 06/11] Add GetPrepareID API.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c | 50 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index bf5daedaa9..c6f6a92752 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -136,6 +136,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -408,9 +409,10 @@ FdwXactPrepareForeignTransactions(void)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -425,13 +427,45 @@ FdwXactPrepareForeignTransactions(void)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 89cec9aa96..91db4f5bfc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -174,6 +174,8 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -256,6 +258,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.23.0

v26-0005-postgres_fdw-supports-prepare-API-and-support-co.patchapplication/octet-stream; name=v26-0005-postgres_fdw-supports-prepare-API-and-support-co.patchDownload
From b1bb4eaf19701a2c49c041bf9ef09a356506648e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v26 05/11] postgres_fdw supports prepare API and support
 commit/rollback prepared.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 185 ++++++++++++++++++++++------
 contrib/postgres_fdw/postgres_fdw.c |   1 +
 contrib/postgres_fdw/postgres_fdw.h |   1 +
 3 files changed, 151 insertions(+), 36 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 10a2815c64..fa8aa6d5df 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -75,7 +75,7 @@ static unsigned int prep_stmt_number = 0;
 static bool xact_got_connection = false;
 
 /* prototypes of private functions */
-static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
+static void connect_pg_server(ConnCacheEntry *entry, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
@@ -95,6 +95,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id,
+									bool is_commit);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -132,35 +134,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	 * require some overhead.  Broken connection will be detected when the
 	 * connection is actually used.
 	 */
-
-	/*
-	 * If cache entry doesn't have a connection, we have to establish a new
-	 * connection.  (If connect_pg_server throws an error, the cache entry
-	 * will remain in a valid empty state, ie conn == NULL.)
-	 */
 	if (entry->conn == NULL)
-	{
-		ForeignServer *server = GetForeignServer(user->serverid);
-
-		/* Reset all transient state fields, to be sure all are clean */
-		entry->xact_depth = 0;
-		entry->have_prep_stmt = false;
-		entry->have_error = false;
-		entry->changing_xact_state = false;
-		entry->invalidated = false;
-		entry->server_hashvalue =
-			GetSysCacheHashValue1(FOREIGNSERVEROID,
-								  ObjectIdGetDatum(server->serverid));
-		entry->mapping_hashvalue =
-			GetSysCacheHashValue1(USERMAPPINGOID,
-								  ObjectIdGetDatum(user->umid));
-
-		/* Now try to make the connection */
-		entry->conn = connect_pg_server(server, user);
-
-		elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
-			 entry->conn, server->servername, user->umid, user->userid);
-	}
+		connect_pg_server(entry, user);
 
 	/*
 	 * Start a new transaction or subtransaction if needed.
@@ -229,11 +204,29 @@ GetConnectionCacheEntry(Oid umid)
 
 /*
  * Connect to remote server using specified server and user mapping properties.
+
+ * If cache entry doesn't have a connection, we have to establish a new
+ * connection.  (If connect_pg_server throws an error, the cache entry
+ * will remain in a valid empty state, ie conn == NULL.)
  */
-static PGconn *
-connect_pg_server(ForeignServer *server, UserMapping *user)
+static void
+connect_pg_server(ConnCacheEntry *entry, UserMapping *user)
 {
 	PGconn	   *volatile conn = NULL;
+	ForeignServer *server = GetForeignServer(user->serverid);
+
+	/* Reset all transient state fields, to be sure all are clean */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+	entry->changing_xact_state = false;
+	entry->invalidated = false;
+	entry->server_hashvalue =
+		GetSysCacheHashValue1(FOREIGNSERVEROID,
+							  ObjectIdGetDatum(server->serverid));
+	entry->mapping_hashvalue =
+		GetSysCacheHashValue1(USERMAPPINGOID,
+							  ObjectIdGetDatum(user->umid));
 
 	/*
 	 * Use PG_TRY block to ensure closing connection on error.
@@ -344,7 +337,9 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 	}
 	PG_END_TRY();
 
-	return conn;
+	entry->conn = conn;
+	elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)",
+		 entry->conn, server->servername, user->umid, user->userid);
 }
 
 /*
@@ -1084,12 +1079,26 @@ void
 postgresCommitForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry;
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has been prepared
+		 * and closed, so we might not have a connection to it. We get a
+		 * connection if necessary.
+		 */
+		if (!entry->conn)
+			connect_pg_server(entry, frstate->usermapping);
+
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1135,16 +1144,31 @@ void
 postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry = NULL;
+	bool			is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/*
+		 * In two-phase commit case, the foreign transaction has been prepared
+		 * and closed, so we might not have a connection to it. We get a
+		 * connection if necessary.
+		 */
+		if (!entry->conn)
+			connect_pg_server(entry, frstate->usermapping);
+
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1221,6 +1245,46 @@ postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1247,3 +1311,52 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8ed6577d35..caccad9846 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -561,6 +561,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index e3b2897495..659222b97a 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
 extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.23.0

v26-0003-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v26-0003-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 7ff1d1019169f51d811c6b84fea958b6ad96f6e6 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v26 03/11] postgres_fdw supports commit and rollback APIs.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 469 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 src/backend/access/fdwxact/fdwxact.c          |   4 +-
 5 files changed, 242 insertions(+), 240 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 08daf26fdf..10a2815c64 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,6 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -79,8 +80,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -93,6 +93,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -107,54 +109,9 @@ static bool UserMappingPasswordRequired(UserMapping *user);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
-
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
 
-		MemSet(&ctl, 0, sizeof(ctl));
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		/* allocate ConnectionHash in the cache context */
-		ctl.hcxt = CacheMemoryContext;
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -208,7 +165,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
-	begin_remote_xact(entry);
+	begin_remote_xact(entry, user);
 
 	/* Remember if caller will prepare statements */
 	entry->have_prep_stmt |= will_prep_stmt;
@@ -216,6 +173,60 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		MemSet(&ctl, 0, sizeof(ctl));
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		/* allocate ConnectionHash in the cache context */
+		ctl.hcxt = CacheMemoryContext;
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Set flag that we did GetConnection during the current transaction */
+	xact_got_connection = true;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+
+	return entry;
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -474,7 +485,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -486,6 +497,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -701,193 +715,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1252,3 +1079,171 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 10e23d02ed..e545d27649 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,6 +8984,6 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..8ed6577d35 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -558,6 +558,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..e3b2897495 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index f5a5c8c2e9..1dcb9b3cee 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -6,8 +6,8 @@
  * This module contains the code for managing transactions started on foreign
  * servers.
  *
- * FDW who implements both commit and rollback APIs can request to register the
- * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
  * participant group.  The registered foreign transactions are identified by
  * OIDs of server and user.  On commit and rollback, the global transaction manager
  * calls corresponding FDW API to end the tranasctions.
-- 
2.23.0

v26-0004-Add-PrepareForeignTransaction-API.patchapplication/octet-stream; name=v26-0004-Add-PrepareForeignTransaction-API.patchDownload
From 4173a8601fd62566a50477d3975b2a4d7c01a568 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v26 04/11] Add PrepareForeignTransaction API.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/fdwxact/fdwxact.c          | 1729 ++++++++++++++++-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    8 +-
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/postmaster/postmaster.c           |    1 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   56 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    2 +
 src/test/regress/expected/rules.out           |    7 +
 34 files changed, 2138 insertions(+), 29 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index e545d27649..cfbed1e1c4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,6 +8984,6 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 1dcb9b3cee..bf5daedaa9 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -12,6 +12,51 @@
  * OIDs of server and user.  On commit and rollback, the global transaction manager
  * calls corresponding FDW API to end the tranasctions.
  *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP).  Foreign servers whose FDW implements prepare API are prepared
+ * when PREPARE TRANSACTION.  To commit or rollback prepared foreign transactions
+ * we can use pg_resolve_foreign_xact() function.
+ *
+ * Two-phase commit protocol is crash-safe.  We WAL logs the foreign transaction
+ * information.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.	 To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *	 status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the entry from
+ *	 being updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
  * Portions Copyright (c) 2020, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -20,15 +65,52 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -37,13 +119,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -52,11 +144,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(void);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -82,6 +266,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/* Foreign server must implement both callback */
@@ -139,14 +330,376 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+		TransactionId xid = GetTopTransactionId();
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Skip if the server's FDW doesn't support two-phase commit */
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			continue;
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.xid = xid;
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	ListCell   *lc;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we prepare on them
+	 * regardless of modified or not.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/* Prepare transactions on participating foreign servers */
+	FdwXactPrepareForeignTransactions();
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -159,6 +712,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
 	state.flags = FDWXACT_FLAG_ONEPHASE;
 
 	if (commit)
@@ -178,14 +732,58 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nlefts = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/*
+		 * Unlock the foreign transaction entries.  Note that there is a race
+		 * condition; the FdwXact entries in FdwXactParticipants could be used
+		 * by other backend before we forget in case where the resolver
+		 * process removes the FdwXact entry and other backend reuses it
+		 * before we forget.  So we need to check if the entries are still
+		 * associated with the transaction.  We cannnot use locking_backend to
+		 * check because the entry might be already held by the resolver
+		 * process.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->valid)
+		{
+			fdwxact->locking_backend = InvalidBackendId;
+			nlefts++;
+		}
+		LWLockRelease(FdwXactLock);
+	}
+
+	/*
+	 * If we left any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -208,23 +806,1132 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+		int			status;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+		/*
+		 * Abort the foreign transaction.  For participants whose status is
+		 * FDWXACT_STATUS_PREPARING, we close the transaction in one-phase.
+		 */
+		SpinLockAcquire(&(fdwxact->mutex));
+		status = fdwxact->status;
+		fdwxact->status = FDWXACT_STATUS_ABORTING;
+		SpinLockRelease(&(fdwxact->mutex));
+
+		if (status == FDWXACT_STATUS_PREPARING)
+			FdwXactParticipantEndTransaction(fdw_part, false);
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * Check if the local transaction has any foreign transaction.
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
  */
-void
-PrePrepare_FdwXact(void)
+static void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
-	/* We don't support to prepare foreign transactions */
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		FdwXactResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx != -1);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare resolution state to pass to API */
+	state.xid = fdwxact->local_xid;
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	5
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "prepared (commit)";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "prepared (abort)";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id,
+															 strlen(fdwxact->fdwxact_id)));
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to resolve foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction entry whose local transaction is prepared"),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	FdwXactCtl->fdwxacts[idx]->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx == -1)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4b3e67eb49 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ef4f9981e3..6f1f4a2da2 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..1e1a538dc9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4600,6 +4601,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6290,6 +6292,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6837,14 +6842,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7046,7 +7052,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7558,11 +7567,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7890,8 +7901,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9189,6 +9204,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 	CheckPointReplicationOrigin();
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9731,6 +9747,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9750,6 +9767,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9768,6 +9786,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9975,6 +9994,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10178,6 +10198,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..cdedddde7e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..bf3482e6bb 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1397,6 +1410,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
 	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
+/*
 	 * Do the deletion
 	 */
 	object.classId = UserMappingRelationId;
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index fb0d854940..d7421d7da4 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,12 +328,18 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
-	/* Sanity check for transaction management callbacks */
+	/* Sanity checks for transaction management callbacks */
 	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
 		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
 		elog(ERROR,
 			 "foreign-data wrapper must support both commit and rollback routines or neither");
 
+	if (routine->PrepareForeignTransaction &&
+		!routine->CommitForeignTransaction &&
+		!routine->RollbackForeignTransaction)
+		elog(ERROR,
+			 "foreign-data wrapper that supports prepare routine must support both commit and rollback routines");
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..6bf5a59b42 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3991,6 +3991,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 959e3b8873..81e6cb9ca2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d5e1..67413d6630 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..2d7191d3cd 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 5aaeb6e2b5..cf1f782dcc 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -184,11 +186,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -207,8 +211,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -407,6 +412,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1677,6 +1683,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1795,6 +1802,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1850,6 +1863,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3741,6 +3757,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..dc29a7ea6f 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 596bcb7b84..b468c5628c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2458,6 +2459,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..863e8ccc3a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 118b282d1c..6ba10927f0 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -204,6 +204,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index cb6ef19182..1712b794c3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 7ccd7b841c..6ba1a475fc 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,26 +10,114 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactRslvState
 {
 	TransactionId xid;
 
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactRslvState;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..e1b09a70d2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..ed6372d2e6 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f48f5fb4d9..52f71ccd17 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5999,6 +5999,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,bool,text}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,in_doubt,identifier}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4db7ade9a3..89cec9aa96 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -171,6 +171,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
 
@@ -254,6 +255,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..d43dcce56f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -933,6 +933,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..0124c8c687 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -91,5 +91,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..9e55fbeec8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.in_doubt,
+    f.identifier
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, in_doubt, identifier);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.23.0

v26-0002-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v26-0002-Introduce-transaction-manager-for-foreign-transa.patchDownload
From d29d3724aeb1347159590c2660f2935997d47cb3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v26 02/11] Introduce transaction manager for foreign
 transactions.

Add both CommitForeignTransaction and RollbackForeignTransaction APIs.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile          |   4 +-
 src/backend/access/fdwxact/Makefile  |  17 ++
 src/backend/access/fdwxact/fdwxact.c | 230 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  10 ++
 src/backend/foreign/foreign.c        |   6 +
 src/include/access/fdwxact.h         |  35 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 7 files changed, 312 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..2372a1a690 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,7 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+SUBDIRS	    = brin common fdwxact gin gist hash heap index nbtree rmgrdesc \
+			  spgist table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..aacab1d729
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..f5a5c8c2e9
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,230 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * FDW who implements both commit and rollback APIs can request to register the
+ * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * participant group.  The registered foreign transactions are identified by
+ * OIDs of server and user.  On commit and rollback, the global transaction manager
+ * calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscachees. Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/* Foreign server must implement both callback */
+	if (!(routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting both commit and rollback callbacks")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Check if the local transaction has any foreign transaction.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	/* We don't support to prepare foreign transactions */
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..0a8d1da4bd 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2229,6 +2230,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
 					  : XACT_EVENT_COMMIT);
 
+	/* Commit foreign transaction if any */
+	AtEOXact_FdwXact(true);
+
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_BEFORE_LOCKS,
 						 true, true);
@@ -2368,6 +2372,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	PrePrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2755,6 +2762,9 @@ AbortTransaction(void)
 		else
 			CallXactCallbacks(XACT_EVENT_ABORT);
 
+		/* Rollback foreign transactions if any */
+		AtEOXact_FdwXact(false);
+
 		ResourceOwnerRelease(TopTransactionResourceOwner,
 							 RESOURCE_RELEASE_BEFORE_LOCKS,
 							 false, true);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..fb0d854940 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,12 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* Sanity check for transaction management callbacks */
+	if ((routine->CommitForeignTransaction && !routine->RollbackForeignTransaction) ||
+		(!routine->CommitForeignTransaction && routine->RollbackForeignTransaction))
+		elog(ERROR,
+			 "foreign-data wrapper must support both commit and rollback routines or neither");
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..7ccd7b841c
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,35 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	TransactionId xid;
+
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void PrePrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..4db7ade9a3 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -170,6 +171,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -246,6 +250,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -259,4 +267,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.23.0

v26-0001-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v26-0001-Recreate-RemoveForeignServerById.patchDownload
From b76bfd839aac2065807d7195fb5565c6b85868a3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v26 01/11] Recreate RemoveForeignServerById()

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index f515e2c308..82dbc988a3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1476,6 +1476,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1490,7 +1494,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 7a079ef07f..737a14a22a 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.23.0

#148tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Ashutosh Bapat (#144)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

The way I am looking at is to put the parallelism in the resolution
worker and not in the FDW. If we use multiple resolution workers, they
can fire commit/abort on multiple foreign servers at a time.

From a single session's view, yes. However, the requests from multiple sessions are processed one at a time within each resolver, because the resolver has to call the synchronous FDW prepare/commit routines and wait for the response from the remote server. That's too limiting.

But if we want parallelism within a single resolution worker, we will
need a separate FDW APIs for firing asynchronous commit/abort prepared
txn and fetching their results resp. But given the variety of FDWs,
not all of them will support asynchronous API, so we have to support
synchronous API anyway, which is what can be targeted in the first
version.

I agree in that most FDWs will be unlikely to have asynchronous prepare/commit functions, as demonstrated by the fact that even Oracle and Db2 don't implement XA asynchronous APIs. That's one problem of using FDW for Postgres scale-out. When we enhance FDW, we have to take care of other DBMSs to make the FDW interface practical. OTOH, we want to make maximum use of Postgres features, such as libpq asynchronous API, to make Postgres scale-out as performant as possible. But the scale-out design is bound by the FDW interface. I don't feel accepting such less performant design is an attitude of this community, as people here are strict against even 1 or 2 percent performance drop.

Thinking more about it, the core may support an API which accepts a
list of prepared transactions, their foreign servers and user mappings
and let FDW resolve all those either in parallel or one by one. So
parallelism is responsibility of FDW and not the core. But then we
loose parallelism across FDWs, which may not be a common case.

Hmm, I understand asynchronous FDW relation scan is being developed now, in the form of cooperation between the FDW and the executor. If we make just the FDW responsible for prepare/commit parallelism, the design becomes asymmetric. As you say, I'm not sure if the parallelism is wanted among different types, say, Postgres and Oracle. In fact, major DBMSs don't implement XA asynchronous API. But such lack of parallelism may be one cause of the bad reputation that 2PC (of XA) is slow.

Given the complications around this, I think we should go ahead
supporting synchronous API first and in second version introduce
optional asynchronous API.

How about the following?

* Add synchronous and asynchronous versions of prepare/commit/abort routines and a routine to wait for completion of asynchronous execution in FdwRoutine. They are optional.
postgres_fdw can implement the asynchronous routines using libpq asynchronous functions. Other DBMSs can implement XA asynchronous API for them in theory.

* The client backend uses asynchronous FDW routines if available:

/* Issue asynchronous prepare | commit | rollback to FDWs that support it */
foreach (per each foreign server used in the transaction)
{
if (fdwroutine->{prepare | commit | rollback}_async_func)
fdwroutine->{prepare | commit | rollback}_async_func(...);
}

/* Wait for completion of asynchronous prepare | commit | rollback */
foreach (per each foreign server used in the transaction)
{
if (fdwroutine->{prepare | commit | rollback}_async_func)
ret = fdwroutine->wait_for_completion(...);
}

/* Issue synchronous prepare | commit | rollback to FDWs that don't support it */
foreach (per each foreign server used in the transaction)
{
if (fdwroutine->{prepare | commit | rollback}_async_func == NULL)
ret = fdwroutine->{prepare | commit | rollback}_func(...);
}

* The client backend asks the resolver to commit or rollback the remote transaction only when the remote transaction fails (due to the failure of remote server or network.) That is, the resolver is not involved during normal operation.

This will not be complex, and can be included in the first version, if we really want to use FDW for Postgres scale-out.

Regards
Takayuki Tsunakawa

#149Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#146)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 24 Sep 2020 at 17:23, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So with your idea, I think we require FDW developers to not call
ereport(ERROR) as much as possible. If they need to use a function
including palloc, lappend etc that could call ereport(ERROR), they
need to use PG_TRY() and PG_CATCH() and return the control along with
the error message to the transaction manager rather than raising an
error. Then the transaction manager will emit the error message at an
error level lower than ERROR (e.g., WARNING), and call commit/rollback
API again. But normally we do some cleanup on error but in this case
the retrying commit/rollback is performed without any cleanup. Is that
right? I’m not sure it’s safe though.

Yes. It's legitimate to require the FDW commit routine to return control, because the prepare of 2PC is a promise to commit successfully. The second-phase commit should avoid doing that could fail. For example, if some memory is needed for commit, it should be allocated in prepare or before.

I don't think it's always possible to avoid raising errors in advance.
Considering how postgres_fdw can implement your idea, I think
postgres_fdw would need PG_TRY() and PG_CATCH() for its connection
management. It has a connection cache in the local memory using HTAB.
It needs to create an entry for the first time to connect (e.g., when
prepare and commit prepared a transaction are performed by different
processes) and it needs to re-connect the foreign server when the
entry is invalidated. In both cases, ERROR could happen. I guess the
same is true for other FDW implementations. Possibly other FDWs might
need more work for example cleanup or releasing resources. I think
that the pros of your idea are to make the transaction manager simple
since we don't need resolvers and launcher but the cons are to bring
the complexity to FDW implementation codes instead. Also, IMHO I don't
think it's safe way that FDW does neither re-throwing an error nor
abort transaction when an error occurs.

In terms of performance you're concerned, I wonder if we can somewhat
eliminate the bottleneck if multiple resolvers are able to run on one
database in the future. For example, if we could launch resolver
processes as many as connections on the database, individual backend
processes could have one resolver process. Since there would be
contention and inter-process communication it still brings some
overhead but it might be negligible comparing to network round trip.

Perhaps we can hear more opinions on that from other hackers to decide
the FDW transaction API design.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#150tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#149)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

I don't think it's always possible to avoid raising errors in advance.
Considering how postgres_fdw can implement your idea, I think
postgres_fdw would need PG_TRY() and PG_CATCH() for its connection
management. It has a connection cache in the local memory using HTAB.
It needs to create an entry for the first time to connect (e.g., when
prepare and commit prepared a transaction are performed by different
processes) and it needs to re-connect the foreign server when the
entry is invalidated. In both cases, ERROR could happen. I guess the
same is true for other FDW implementations. Possibly other FDWs might
need more work for example cleanup or releasing resources. I think

Why does the client backend have to create a new connection cache entry during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally continue to use connections that it has used in its current transaction?

that the pros of your idea are to make the transaction manager simple
since we don't need resolvers and launcher but the cons are to bring
the complexity to FDW implementation codes instead. Also, IMHO I don't
think it's safe way that FDW does neither re-throwing an error nor
abort transaction when an error occurs.

No, I didn't say the resolver is unnecessary. The resolver takes care of terminating remote transactions when the client backend encountered an error during COMMIT/ROLLBACK PREPARED.

In terms of performance you're concerned, I wonder if we can somewhat
eliminate the bottleneck if multiple resolvers are able to run on one
database in the future. For example, if we could launch resolver
processes as many as connections on the database, individual backend
processes could have one resolver process. Since there would be
contention and inter-process communication it still brings some
overhead but it might be negligible comparing to network round trip.

Do you mean that if concurrent 200 clients each update data on two foreign servers, there are 400 resolvers? ...That's overuse of resources.

Regards
Takayuki Tsunakawa

#151Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#150)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 25 Sep 2020 at 18:21, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

I don't think it's always possible to avoid raising errors in advance.
Considering how postgres_fdw can implement your idea, I think
postgres_fdw would need PG_TRY() and PG_CATCH() for its connection
management. It has a connection cache in the local memory using HTAB.
It needs to create an entry for the first time to connect (e.g., when
prepare and commit prepared a transaction are performed by different
processes) and it needs to re-connect the foreign server when the
entry is invalidated. In both cases, ERROR could happen. I guess the
same is true for other FDW implementations. Possibly other FDWs might
need more work for example cleanup or releasing resources. I think

Why does the client backend have to create a new connection cache entry during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally continue to use connections that it has used in its current transaction?

I think there are two cases: a process executes PREPARE TRANSACTION
and another process executes COMMIT PREPARED later, and if the
coordinator has cascaded foreign servers (i.g., a foreign server has
its foreign server) and temporary connection problem happens in the
intermediate node after PREPARE then another process on the
intermediate node will execute COMMIT PREPARED on its foreign server.

that the pros of your idea are to make the transaction manager simple
since we don't need resolvers and launcher but the cons are to bring
the complexity to FDW implementation codes instead. Also, IMHO I don't
think it's safe way that FDW does neither re-throwing an error nor
abort transaction when an error occurs.

No, I didn't say the resolver is unnecessary. The resolver takes care of terminating remote transactions when the client backend encountered an error during COMMIT/ROLLBACK PREPARED.

Understood. With your idea, we can remove at least the code of making
backend wait and inter-process communication between backends and
resolvers.

I think we need to consider that it's really safe and what needs to
achieve your idea safely.

In terms of performance you're concerned, I wonder if we can somewhat
eliminate the bottleneck if multiple resolvers are able to run on one
database in the future. For example, if we could launch resolver
processes as many as connections on the database, individual backend
processes could have one resolver process. Since there would be
contention and inter-process communication it still brings some
overhead but it might be negligible comparing to network round trip.

Do you mean that if concurrent 200 clients each update data on two foreign servers, there are 400 resolvers? ...That's overuse of resources.

I think we have 200 resolvers in this case since one resolver process
per backend process. Or another idea is that all processes queue
foreign transactions to resolve into the shared memory queue and
resolver processes fetch and resolve them instead of assigning one
distributed transaction to one resolver process. Using asynchronous
execution, the resolver process can process a bunch of foreign
transactions across distributed transactions and grouped by the
foreign server at once. It might be more complex than the current
approach but having multiple resolver processes on one database would
increase through-put well especially by combining with asynchronous
execution.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#152tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#151)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

On Fri, 25 Sep 2020 at 18:21, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Why does the client backend have to create a new connection cache entry

during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally
continue to use connections that it has used in its current transaction?

I think there are two cases: a process executes PREPARE TRANSACTION
and another process executes COMMIT PREPARED later, and if the
coordinator has cascaded foreign servers (i.g., a foreign server has
its foreign server) and temporary connection problem happens in the
intermediate node after PREPARE then another process on the
intermediate node will execute COMMIT PREPARED on its foreign server.

Aren't both the cases failure cases, and thus handled by the resolver?

In terms of performance you're concerned, I wonder if we can somewhat
eliminate the bottleneck if multiple resolvers are able to run on one
database in the future. For example, if we could launch resolver
processes as many as connections on the database, individual backend
processes could have one resolver process. Since there would be
contention and inter-process communication it still brings some
overhead but it might be negligible comparing to network round trip.

Do you mean that if concurrent 200 clients each update data on two foreign

servers, there are 400 resolvers? ...That's overuse of resources.

I think we have 200 resolvers in this case since one resolver process
per backend process.

That does not parallelize prepare or commit for a single client, as each resolver can process only one prepare or commit synchronously at a time. Not to mention the resource usage is high.

Or another idea is that all processes queue
foreign transactions to resolve into the shared memory queue and
resolver processes fetch and resolve them instead of assigning one
distributed transaction to one resolver process. Using asynchronous
execution, the resolver process can process a bunch of foreign
transactions across distributed transactions and grouped by the
foreign server at once. It might be more complex than the current
approach but having multiple resolver processes on one database would
increase through-put well especially by combining with asynchronous
execution.

Yeah, that sounds complex. It's simpler and natural for each client backend to use the connections it has used in its current transaction and issue prepare and commit to the foreign servers, and the resolver just takes care of failed commits and aborts behind the scenes. That's like the walwriter takes care of writing WAL based on the client backend that commits asynchronously.

Regards
Takayuki Tsunakawa

#153Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#152)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, 28 Sep 2020 at 13:58, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

On Fri, 25 Sep 2020 at 18:21, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Why does the client backend have to create a new connection cache entry

during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally
continue to use connections that it has used in its current transaction?

I think there are two cases: a process executes PREPARE TRANSACTION
and another process executes COMMIT PREPARED later, and if the
coordinator has cascaded foreign servers (i.g., a foreign server has
its foreign server) and temporary connection problem happens in the
intermediate node after PREPARE then another process on the
intermediate node will execute COMMIT PREPARED on its foreign server.

Aren't both the cases failure cases, and thus handled by the resolver?

No. Please imagine a case where a user executes PREPARE TRANSACTION on
the transaction that modified data on foreign servers. The backend
process prepares both the local transaction and foreign transactions.
But another client can execute COMMIT PREPARED on the prepared
transaction. In this case, another backend newly connects foreign
servers and commits prepared foreign transactions. Therefore, the new
connection cache entry can be created during COMMIT PREPARED which
could lead to an error but since the local prepared transaction is
already committed the backend must not fail with an error.

In the latter case, I’m assumed that the backend continues to retry
foreign transaction resolution until the user requests cancellation.
Please imagine the case where the server-A connects a foreign server
(say, server-B) and server-B connects another foreign server (say,
server-C). The transaction initiated on server-A modified the data on
both local and server-B which further modified the data on server-C
and executed COMMIT. The backend process on server-A (say, backend-A)
sends PREPARE TRANSACTION to server-B then the backend process on
server-B (say, backend-B) connected by backend-A prepares the local
transaction and further sends PREPARE TRANSACTION to server-C. Let’s
suppose a temporary connection failure happens between server-A and
server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase
of 2PC). When the backend-A attempts to sends COMMIT PREPARED to
server-B it realizes that the connection to server-B was lost but
since the user doesn’t request cancellatino yet the backend-A retries
to connect server-B and suceeds. Since now that the backend-A
established a new connection to server-B, there is another backend
process on server-B (say, backend-B’). Since the backend-B’ doen’t
have a connection to server-C yet, it creates new connection cache
entry, which could lead to an error. IOW, on server-B different
processes performed PREPARE TRANSACTION and COMMIT PREPARED and the
later process created a connection cache entry.

In terms of performance you're concerned, I wonder if we can somewhat
eliminate the bottleneck if multiple resolvers are able to run on one
database in the future. For example, if we could launch resolver
processes as many as connections on the database, individual backend
processes could have one resolver process. Since there would be
contention and inter-process communication it still brings some
overhead but it might be negligible comparing to network round trip.

Do you mean that if concurrent 200 clients each update data on two foreign

servers, there are 400 resolvers? ...That's overuse of resources.

I think we have 200 resolvers in this case since one resolver process
per backend process.

That does not parallelize prepare or commit for a single client, as each resolver can process only one prepare or commit synchronously at a time. Not to mention the resource usage is high.

Well, I think we should discuss parallel (and/or asyncronous)
execution of prepare and commit separated from the discussion on
whether the resolver process is responsible for 2nd phase of 2PC. I've
been suggesting that the first phase and the second phase of 2PC
should be performed by different processes in terms of safety. And
having multiple resolvers on one database is my suggestion in response
to the concern you raised that one resolver process on one database
can be bottleneck. Both parallel executionand asynchronous execution
are slightly related to this topic but I think it should be discussed
separately.

Regarding parallel and asynchronous execution, I basically agree on
supporting asynchronous execution as the XA specification also has,
although I think it's better not to include it in the first version
for simplisity.

Overall, my suggestion for the first version is to support synchronous
execution of prepare, commit, and rollback, have one resolver process
per database, and have resolver take 2nd phase of 2PC. As the next
step we can add APIs for asynchronous execution, have multiple
resolvers on one database and so on.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#154tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#153)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

No. Please imagine a case where a user executes PREPARE TRANSACTION on
the transaction that modified data on foreign servers. The backend
process prepares both the local transaction and foreign transactions.
But another client can execute COMMIT PREPARED on the prepared
transaction. In this case, another backend newly connects foreign
servers and commits prepared foreign transactions. Therefore, the new
connection cache entry can be created during COMMIT PREPARED which
could lead to an error but since the local prepared transaction is
already committed the backend must not fail with an error.

In the latter case, I’m assumed that the backend continues to retry
foreign transaction resolution until the user requests cancellation.
Please imagine the case where the server-A connects a foreign server
(say, server-B) and server-B connects another foreign server (say,
server-C). The transaction initiated on server-A modified the data on
both local and server-B which further modified the data on server-C
and executed COMMIT. The backend process on server-A (say, backend-A)
sends PREPARE TRANSACTION to server-B then the backend process on
server-B (say, backend-B) connected by backend-A prepares the local
transaction and further sends PREPARE TRANSACTION to server-C. Let’s
suppose a temporary connection failure happens between server-A and
server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase
of 2PC). When the backend-A attempts to sends COMMIT PREPARED to
server-B it realizes that the connection to server-B was lost but
since the user doesn’t request cancellatino yet the backend-A retries
to connect server-B and suceeds. Since now that the backend-A
established a new connection to server-B, there is another backend
process on server-B (say, backend-B’). Since the backend-B’ doen’t
have a connection to server-C yet, it creates new connection cache
entry, which could lead to an error. IOW, on server-B different
processes performed PREPARE TRANSACTION and COMMIT PREPARED and
the
later process created a connection cache entry.

Thank you, I understood the situation. I don't think it's a good design to not address practical performance during normal operation by fearing the rare error case.

The transaction manager (TM) or the FDW implementor can naturally do things like the following:

* Use palloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) to return control to the caller.

* Use PG_TRY(), as its overhead is relatively negligible to connection establishment.

* If the commit fails, the TM asks the resolver to take care of committing the remote transaction, and returns success to the user.

Regarding parallel and asynchronous execution, I basically agree on
supporting asynchronous execution as the XA specification also has,
although I think it's better not to include it in the first version
for simplisity.

Overall, my suggestion for the first version is to support synchronous
execution of prepare, commit, and rollback, have one resolver process
per database, and have resolver take 2nd phase of 2PC. As the next
step we can add APIs for asynchronous execution, have multiple
resolvers on one database and so on.

We don't have to rush to commit a patch that is likely to exhibit non-practical performance, as we still have much time left for PG 14. The design needs to be more thought for the ideal goal and refined. By making efforts to sort through the ideal design, we may be able to avoid rework and API inconsistency. As for the API, we haven't validated yet that the FDW implementor can use XA, have we?

Regards
Takayuki Tsunakawa

#155Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#154)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 29 Sep 2020 at 11:37, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

No. Please imagine a case where a user executes PREPARE TRANSACTION on
the transaction that modified data on foreign servers. The backend
process prepares both the local transaction and foreign transactions.
But another client can execute COMMIT PREPARED on the prepared
transaction. In this case, another backend newly connects foreign
servers and commits prepared foreign transactions. Therefore, the new
connection cache entry can be created during COMMIT PREPARED which
could lead to an error but since the local prepared transaction is
already committed the backend must not fail with an error.

In the latter case, I’m assumed that the backend continues to retry
foreign transaction resolution until the user requests cancellation.
Please imagine the case where the server-A connects a foreign server
(say, server-B) and server-B connects another foreign server (say,
server-C). The transaction initiated on server-A modified the data on
both local and server-B which further modified the data on server-C
and executed COMMIT. The backend process on server-A (say, backend-A)
sends PREPARE TRANSACTION to server-B then the backend process on
server-B (say, backend-B) connected by backend-A prepares the local
transaction and further sends PREPARE TRANSACTION to server-C. Let’s
suppose a temporary connection failure happens between server-A and
server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase
of 2PC). When the backend-A attempts to sends COMMIT PREPARED to
server-B it realizes that the connection to server-B was lost but
since the user doesn’t request cancellatino yet the backend-A retries
to connect server-B and suceeds. Since now that the backend-A
established a new connection to server-B, there is another backend
process on server-B (say, backend-B’). Since the backend-B’ doen’t
have a connection to server-C yet, it creates new connection cache
entry, which could lead to an error. IOW, on server-B different
processes performed PREPARE TRANSACTION and COMMIT PREPARED and
the
later process created a connection cache entry.

Thank you, I understood the situation. I don't think it's a good design to not address practical performance during normal operation by fearing the rare error case.

The transaction manager (TM) or the FDW implementor can naturally do things like the following:

* Use palloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) to return control to the caller.

* Use PG_TRY(), as its overhead is relatively negligible to connection establishment.

I suppose you mean that the FDW implementor uses PG_TRY() to catch an
error but not do PG_RE_THROW(). I'm concerned that it's safe to return
the control to the caller and continue trying to resolve foreign
transactions without neither rethrowing an error nor transaction
abort.

IMHO, it's rather a bad design something like "high performance but
doesn't work fine in a rare failure case", especially for the
transaction management feature.

* If the commit fails, the TM asks the resolver to take care of committing the remote transaction, and returns success to the user.

Regarding parallel and asynchronous execution, I basically agree on
supporting asynchronous execution as the XA specification also has,
although I think it's better not to include it in the first version
for simplisity.

Overall, my suggestion for the first version is to support synchronous
execution of prepare, commit, and rollback, have one resolver process
per database, and have resolver take 2nd phase of 2PC. As the next
step we can add APIs for asynchronous execution, have multiple
resolvers on one database and so on.

We don't have to rush to commit a patch that is likely to exhibit non-practical performance, as we still have much time left for PG 14. The design needs to be more thought for the ideal goal and refined. By making efforts to sort through the ideal design, we may be able to avoid rework and API inconsistency. As for the API, we haven't validated yet that the FDW implementor can use XA, have we?

Yes, we still need to check if FDW implementor other than postgres_fdw
is able to support these APIs. I agree that we need more discussion on
the design. My suggestion is to start a small, simple feature as the
first step and not try to include everything in the first version.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#156Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Masahiko Sawada (#155)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 29 Sep 2020 at 15:03, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 29 Sep 2020 at 11:37, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

No. Please imagine a case where a user executes PREPARE TRANSACTION on
the transaction that modified data on foreign servers. The backend
process prepares both the local transaction and foreign transactions.
But another client can execute COMMIT PREPARED on the prepared
transaction. In this case, another backend newly connects foreign
servers and commits prepared foreign transactions. Therefore, the new
connection cache entry can be created during COMMIT PREPARED which
could lead to an error but since the local prepared transaction is
already committed the backend must not fail with an error.

In the latter case, I’m assumed that the backend continues to retry
foreign transaction resolution until the user requests cancellation.
Please imagine the case where the server-A connects a foreign server
(say, server-B) and server-B connects another foreign server (say,
server-C). The transaction initiated on server-A modified the data on
both local and server-B which further modified the data on server-C
and executed COMMIT. The backend process on server-A (say, backend-A)
sends PREPARE TRANSACTION to server-B then the backend process on
server-B (say, backend-B) connected by backend-A prepares the local
transaction and further sends PREPARE TRANSACTION to server-C. Let’s
suppose a temporary connection failure happens between server-A and
server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase
of 2PC). When the backend-A attempts to sends COMMIT PREPARED to
server-B it realizes that the connection to server-B was lost but
since the user doesn’t request cancellatino yet the backend-A retries
to connect server-B and suceeds. Since now that the backend-A
established a new connection to server-B, there is another backend
process on server-B (say, backend-B’). Since the backend-B’ doen’t
have a connection to server-C yet, it creates new connection cache
entry, which could lead to an error. IOW, on server-B different
processes performed PREPARE TRANSACTION and COMMIT PREPARED and
the
later process created a connection cache entry.

Thank you, I understood the situation. I don't think it's a good design to not address practical performance during normal operation by fearing the rare error case.

The transaction manager (TM) or the FDW implementor can naturally do things like the following:

* Use palloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) to return control to the caller.

* Use PG_TRY(), as its overhead is relatively negligible to connection establishment.

I suppose you mean that the FDW implementor uses PG_TRY() to catch an
error but not do PG_RE_THROW(). I'm concerned that it's safe to return
the control to the caller and continue trying to resolve foreign
transactions without neither rethrowing an error nor transaction
abort.

IMHO, it's rather a bad design something like "high performance but
doesn't work fine in a rare failure case", especially for the
transaction management feature.

To avoid misunderstanding, I didn't mean to disregard the performance.
I mean especially for the transaction management feature it's
essential to work fine even in failure cases. So I hope we have a
safe, robust, and probably simple design for the first version that
might be low performance yet though but have a potential for
performance improvement and we will be able to try to improve
performance later.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#157tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#156)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

To avoid misunderstanding, I didn't mean to disregard the performance.
I mean especially for the transaction management feature it's
essential to work fine even in failure cases. So I hope we have a
safe, robust, and probably simple design for the first version that
might be low performance yet though but have a potential for
performance improvement and we will be able to try to improve
performance later.

Yes, correctness (safety?) is a basic premise. I understand that given the time left for PG 14, we haven't yet given up a sound design that offers practical or normally expected performance. I don't think the design has not well thought yet to see if it's simple or complex. At least, I don't believe doing "send commit request, perform commit on a remote server, and wait for reply" sequence one transaction at a time in turn is what this community (and other DBMSs) tolerate. A kid's tricycle is safe, but it's not safe to ride a tricycle on the road. Let's not rush to commit and do our best!

Regards
Takayuki Tsunakawa

#158Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#157)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 30 Sep 2020 at 16:02, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

To avoid misunderstanding, I didn't mean to disregard the performance.
I mean especially for the transaction management feature it's
essential to work fine even in failure cases. So I hope we have a
safe, robust, and probably simple design for the first version that
might be low performance yet though but have a potential for
performance improvement and we will be able to try to improve
performance later.

Yes, correctness (safety?) is a basic premise. I understand that given the time left for PG 14, we haven't yet given up a sound design that offers practical or normally expected performance. I don't think the design has not well thought yet to see if it's simple or complex. At least, I don't believe doing "send commit request, perform commit on a remote server, and wait for reply" sequence one transaction at a time in turn is what this community (and other DBMSs) tolerate. A kid's tricycle is safe, but it's not safe to ride a tricycle on the road. Let's not rush to commit and do our best!

Okay. I'd like to resolve my concern that I repeatedly mentioned and
we don't find a good solution yet. That is, how we handle errors
raised by FDW transaction callbacks during committing/rolling back
prepared foreign transactions. Actually, this has already been
discussed before[1]/messages/by-id/CA+TgmoY=VkHrzXD=jw5DA+Pp-ePW_6_v5n+TJk40s5Q9VXY-Pw@mail.gmail.com and we concluded at that time that using a
background worker to commit/rolling back foreign prepared transactions
is the best way.

Anyway, let me summarize the discussion on this issue so far. With
your idea, after the local commit, the backend process directly call
transaction FDW API to commit the foreign prepared transactions.
However, it's likely to happen an error (i.g. ereport(ERROR)) during
that due to various reasons. It could be an OOM by memory allocation,
connection error whatever. In case an error happens during committing
prepared foreign transactions, the user will get the error but it's
too late. The local transaction and possibly other foreign prepared
transaction have already been committed. You proposed the first idea
to avoid such a situation that FDW implementor can write the code
while trying to reduce the possibility of errors happening as much as
possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and
hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive
solution. They might miss, not know it, or use other functions
provided by the core that could lead an error. Another idea is to use
PG_TRY() and PG_CATCH(). IIUC with this idea, FDW implementor catches
an error but ignores it rather than rethrowing by PG_RE_THROW() in
order to return the control to the core after an error. I’m really not
sure it’s a correct usage of those macros. In addition, after
returning to the core, it will retry to resolve the same or other
foreign transactions. That is, after ignoring an error, the core needs
to continue working and possibly call transaction callbacks of other
FDW implementations.

Regards,

[1]: /messages/by-id/CA+TgmoY=VkHrzXD=jw5DA+Pp-ePW_6_v5n+TJk40s5Q9VXY-Pw@mail.gmail.com

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#159tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#158)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

You proposed the first idea
to avoid such a situation that FDW implementor can write the code
while trying to reduce the possibility of errors happening as much as
possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and
hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive
solution. They might miss, not know it, or use other functions
provided by the core that could lead an error.

We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor compared to other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function, trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementor would use to implement their commit will be very limited, won't they? Because most of the commit processing is performed in the resource manager's library (e.g. Oracle and MySQL client library.)

(Before that, the developer of server-side modules is not given any information on what functions (like palloc) are available in the manual, is he?)

Another idea is to use
PG_TRY() and PG_CATCH(). IIUC with this idea, FDW implementor catches
an error but ignores it rather than rethrowing by PG_RE_THROW() in
order to return the control to the core after an error. I’m really not
sure it’s a correct usage of those macros. In addition, after
returning to the core, it will retry to resolve the same or other
foreign transactions. That is, after ignoring an error, the core needs
to continue working and possibly call transaction callbacks of other
FDW implementations.

No, not ignore the error. The FDW can emit a WARNING, LOG, or NOTICE message, and return an error code to TM. TM can also emit a message like:

WARNING: failed to commit part of a transaction on the foreign server 'XXX'
HINT: The server continues to try committing the remote transaction.

Then TM asks the resolver to take care of committing the remote transaction, and acknowledge the commit success to the client. The relevant return codes of xa_commit() are:

--------------------------------------------------
[XAER_RMERR]
An error occurred in committing the work performed on behalf of the transaction
branch and the branch’s work has been rolled back. Note that returning this error
signals a catastrophic event to a transaction manager since other resource
managers may successfully commit their work on behalf of this branch. This error
should be returned only when a resource manager concludes that it can never
commit the branch and that it cannot hold the branch’s resources in a prepared
state. Otherwise, [XA_RETRY] should be returned.

[XAER_RMFAIL]
An error occurred that makes the resource manager unavailable.
--------------------------------------------------

Regards
Takayuki Tsunakawa

#160Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#159)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 2 Oct 2020 at 18:20, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

You proposed the first idea
to avoid such a situation that FDW implementor can write the code
while trying to reduce the possibility of errors happening as much as
possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and
hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive
solution. They might miss, not know it, or use other functions
provided by the core that could lead an error.

We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor compared to other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function, trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementor would use to implement their commit will be very limited, won't they? Because most of the commit processing is performed in the resource manager's library (e.g. Oracle and MySQL client library.)

Yeah, if we think FDW implementors properly implement these APIs while
following the guideline, giving the guideline is a good idea. But I’m
not sure all FDW implementors are able to do that and even if the user
uses an FDW whose transaction APIs don’t follow the guideline, the
user won’t realize it. IMO it’s better to design the feature while not
depending on external programs for reliability (correctness?) of this
feature, although I might be too worried.

Another idea is to use
PG_TRY() and PG_CATCH(). IIUC with this idea, FDW implementor catches
an error but ignores it rather than rethrowing by PG_RE_THROW() in
order to return the control to the core after an error. I’m really not
sure it’s a correct usage of those macros. In addition, after
returning to the core, it will retry to resolve the same or other
foreign transactions. That is, after ignoring an error, the core needs
to continue working and possibly call transaction callbacks of other
FDW implementations.

No, not ignore the error. The FDW can emit a WARNING, LOG, or NOTICE message, and return an error code to TM. TM can also emit a message like:

WARNING: failed to commit part of a transaction on the foreign server 'XXX'
HINT: The server continues to try committing the remote transaction.

Then TM asks the resolver to take care of committing the remote transaction, and acknowledge the commit success to the client.

It seems like if failed to resolve, the backend would return an
acknowledgment of COMMIT to the client and the resolver process
resolves foreign prepared transactions in the background. So we can
ensure that the distributed transaction is completed at the time when
the client got an acknowledgment of COMMIT if 2nd phase of 2PC is
successfully completed in the first attempts. OTOH, if it failed for
whatever reason, there is no such guarantee. From an optimistic
perspective, i.g., the failures are unlikely to happen, it will work
well but IMO it’s not uncommon to fail to resolve foreign transactions
due to network issue, especially in an unreliable network environment
for example geo-distributed database. So I think it will end up
requiring the client to check if preceding distributed transactions
are completed or not in order to see the results of these
transactions.

We could retry the foreign transaction resolution before leaving it to
the resolver process but the problem that the core continues trying to
resolve foreign transactions without neither transaction aborting and
rethrowing even after an error still remains.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#161Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Masahiko Sawada (#160)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Oct 6, 2020 at 7:22 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 2 Oct 2020 at 18:20, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

You proposed the first idea
to avoid such a situation that FDW implementor can write the code
while trying to reduce the possibility of errors happening as much as
possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and
hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive
solution. They might miss, not know it, or use other functions
provided by the core that could lead an error.

We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor compared to other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function, trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementor would use to implement their commit will be very limited, won't they? Because most of the commit processing is performed in the resource manager's library (e.g. Oracle and MySQL client library.)

Yeah, if we think FDW implementors properly implement these APIs while
following the guideline, giving the guideline is a good idea. But I’m
not sure all FDW implementors are able to do that and even if the user
uses an FDW whose transaction APIs don’t follow the guideline, the
user won’t realize it. IMO it’s better to design the feature while not
depending on external programs for reliability (correctness?) of this
feature, although I might be too worried.

+1 for that. I don't think it's even in the hands of implementers to
avoid throwing an error in all the conditions.

--
Best Wishes,
Ashutosh Bapat

#162Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#160)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Oct 6, 2020 at 10:52 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 2 Oct 2020 at 18:20, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

You proposed the first idea
to avoid such a situation that FDW implementor can write the code
while trying to reduce the possibility of errors happening as much as
possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and
hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive
solution. They might miss, not know it, or use other functions
provided by the core that could lead an error.

We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor compared to other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function, trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementor would use to implement their commit will be very limited, won't they? Because most of the commit processing is performed in the resource manager's library (e.g. Oracle and MySQL client library.)

Yeah, if we think FDW implementors properly implement these APIs while
following the guideline, giving the guideline is a good idea. But I’m
not sure all FDW implementors are able to do that and even if the user
uses an FDW whose transaction APIs don’t follow the guideline, the
user won’t realize it. IMO it’s better to design the feature while not
depending on external programs for reliability (correctness?) of this
feature, although I might be too worried.

After more thoughts on Tsunakawa-san’s idea it seems to need the
following conditions:

* At least postgres_fdw is viable to implement these APIs while
guaranteeing not to happen any error.
* A certain number of FDWs (or majority of FDWs) can do that in a
similar way to postgres_fdw by using the guideline and probably
postgres_fdw as a reference.

These are necessary for FDW implementors to implement APIs while
following the guideline and for the core to trust them.

As far as postgres_fdw goes, what we need to do when committing a
foreign transaction resolution is to get a connection from the
connection cache or create and connect if not found, construct a SQL
query (COMMIT/ROLLBACK PREPARED with identifier) using a fixed-size
buffer, send the query, and get the result. The possible place to
raise an error is limited. In case of failures such as connection
error FDW can return false to the core along with a flag indicating to
ask the core retry. Then the core will retry to resolve foreign
transactions after some sleep. OTOH if FDW sized up that there is no
hope of resolving the foreign transaction, it also could return false
to the core along with another flag indicating to remove the entry and
not to retry. Also, the transaction resolution by FDW needs to be
cancellable (interruptible) but cannot use CHECK_FOR_INTERRUPTS().

Probably, as Tsunakawa-san also suggested, it’s not impossible to
implement these APIs in postgres_fdw while guaranteeing not to happen
any error, although not sure the code complexity. So I think the first
condition may be true but not sure about the second assumption,
particularly about the interruptible part.

I thought we could support both ideas to get their pros; supporting
Tsunakawa-san's idea and then my idea if necessary, and FDW can choose
whether to ask the resolver process to perform 2nd phase of 2PC or
not. But it's not a good idea in terms of complexity.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#163tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#162)
RE: Transactions involving multiple postgres foreign servers, take 2

Sorry to be late to respond. (My PC is behaving strangely after upgrading Win10 2004)

From: Masahiko Sawada <sawada.mshk@gmail.com>

After more thoughts on Tsunakawa-san’s idea it seems to need the
following conditions:

* At least postgres_fdw is viable to implement these APIs while
guaranteeing not to happen any error.
* A certain number of FDWs (or majority of FDWs) can do that in a
similar way to postgres_fdw by using the guideline and probably
postgres_fdw as a reference.

These are necessary for FDW implementors to implement APIs while
following the guideline and for the core to trust them.

As far as postgres_fdw goes, what we need to do when committing a
foreign transaction resolution is to get a connection from the
connection cache or create and connect if not found, construct a SQL
query (COMMIT/ROLLBACK PREPARED with identifier) using a fixed-size
buffer, send the query, and get the result. The possible place to
raise an error is limited. In case of failures such as connection
error FDW can return false to the core along with a flag indicating to
ask the core retry. Then the core will retry to resolve foreign
transactions after some sleep. OTOH if FDW sized up that there is no
hope of resolving the foreign transaction, it also could return false
to the core along with another flag indicating to remove the entry and
not to retry. Also, the transaction resolution by FDW needs to be
cancellable (interruptible) but cannot use CHECK_FOR_INTERRUPTS().

Probably, as Tsunakawa-san also suggested, it’s not impossible to
implement these APIs in postgres_fdw while guaranteeing not to happen
any error, although not sure the code complexity. So I think the first
condition may be true but not sure about the second assumption,
particularly about the interruptible part.

Yeah, I expect the commit of the second phase should not be difficult for the FDW developer.

As for the cancellation during commit retry, I don't think we necessarily have to make the TM responsible for retrying the commits. Many DBMSs have their own timeout functionality such as connection timeout, socket timeout, and statement timeout. Users can set those parameters in the foreign server options based on how long the end user can wait. That is, TM calls FDW's commit routine just once.

If the TM makes efforts to retry commits, the duration would be from a few seconds to 30 seconds. Then, we can hold back the cancellation during that period.

I thought we could support both ideas to get their pros; supporting
Tsunakawa-san's idea and then my idea if necessary, and FDW can choose
whether to ask the resolver process to perform 2nd phase of 2PC or
not. But it's not a good idea in terms of complexity.

I don't feel the need for leaving the commit to the resolver during normal operation.

seems like if failed to resolve, the backend would return an

acknowledgment of COMMIT to the client and the resolver process
resolves foreign prepared transactions in the background. So we can
ensure that the distributed transaction is completed at the time when
the client got an acknowledgment of COMMIT if 2nd phase of 2PC is
successfully completed in the first attempts. OTOH, if it failed for
whatever reason, there is no such guarantee. From an optimistic
perspective, i.g., the failures are unlikely to happen, it will work
well but IMO it’s not uncommon to fail to resolve foreign transactions
due to network issue, especially in an unreliable network environment
for example geo-distributed database. So I think it will end up
requiring the client to check if preceding distributed transactions
are completed or not in order to see the results of these
transactions.

That issue exists with any method, doesn't it?

Regards
Takayuki Tsunakawa

#164Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#163)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, 8 Oct 2020 at 18:05, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Sorry to be late to respond. (My PC is behaving strangely after upgrading Win10 2004)

From: Masahiko Sawada <sawada.mshk@gmail.com>

After more thoughts on Tsunakawa-san’s idea it seems to need the
following conditions:

* At least postgres_fdw is viable to implement these APIs while
guaranteeing not to happen any error.
* A certain number of FDWs (or majority of FDWs) can do that in a
similar way to postgres_fdw by using the guideline and probably
postgres_fdw as a reference.

These are necessary for FDW implementors to implement APIs while
following the guideline and for the core to trust them.

As far as postgres_fdw goes, what we need to do when committing a
foreign transaction resolution is to get a connection from the
connection cache or create and connect if not found, construct a SQL
query (COMMIT/ROLLBACK PREPARED with identifier) using a fixed-size
buffer, send the query, and get the result. The possible place to
raise an error is limited. In case of failures such as connection
error FDW can return false to the core along with a flag indicating to
ask the core retry. Then the core will retry to resolve foreign
transactions after some sleep. OTOH if FDW sized up that there is no
hope of resolving the foreign transaction, it also could return false
to the core along with another flag indicating to remove the entry and
not to retry. Also, the transaction resolution by FDW needs to be
cancellable (interruptible) but cannot use CHECK_FOR_INTERRUPTS().

Probably, as Tsunakawa-san also suggested, it’s not impossible to
implement these APIs in postgres_fdw while guaranteeing not to happen
any error, although not sure the code complexity. So I think the first
condition may be true but not sure about the second assumption,
particularly about the interruptible part.

Yeah, I expect the commit of the second phase should not be difficult for the FDW developer.

As for the cancellation during commit retry, I don't think we necessarily have to make the TM responsible for retrying the commits. Many DBMSs have their own timeout functionality such as connection timeout, socket timeout, and statement timeout.
Users can set those parameters in the foreign server options based on how long the end user can wait. That is, TM calls FDW's commit routine just once.

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Also, what if the user sets the statement timeout to 60 sec and they
want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
that client libraries of other DBMSs don't have asynchronous execution
functionality. If the SQL execution function is not interruptible, the
user will end up waiting for 60 sec, which seems not good.

If the TM makes efforts to retry commits, the duration would be from a few seconds to 30 seconds. Then, we can hold back the cancellation during that period.

I thought we could support both ideas to get their pros; supporting
Tsunakawa-san's idea and then my idea if necessary, and FDW can choose
whether to ask the resolver process to perform 2nd phase of 2PC or
not. But it's not a good idea in terms of complexity.

I don't feel the need for leaving the commit to the resolver during normal operation.

I meant it's for FDWs that cannot guarantee not to happen error during
resolution.

seems like if failed to resolve, the backend would return an

acknowledgment of COMMIT to the client and the resolver process
resolves foreign prepared transactions in the background. So we can
ensure that the distributed transaction is completed at the time when
the client got an acknowledgment of COMMIT if 2nd phase of 2PC is
successfully completed in the first attempts. OTOH, if it failed for
whatever reason, there is no such guarantee. From an optimistic
perspective, i.g., the failures are unlikely to happen, it will work
well but IMO it’s not uncommon to fail to resolve foreign transactions
due to network issue, especially in an unreliable network environment
for example geo-distributed database. So I think it will end up
requiring the client to check if preceding distributed transactions
are completed or not in order to see the results of these
transactions.

That issue exists with any method, doesn't it?

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#165tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#164)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly find the English manual, so below is in Japanese. I quoted some text that got through machine translation, which appears a bit strange.)

https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm
--------------------------------------------------
Abandon timeout
Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phase commit transaction.

In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction until all resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attempt is made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transaction manager rolls back the transaction and releases the held lock on behalf of the destroyed transaction.
--------------------------------------------------

Also, what if the user sets the statement timeout to 60 sec and they
want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
that client libraries of other DBMSs don't have asynchronous execution
functionality. If the SQL execution function is not interruptible, the
user will end up waiting for 60 sec, which seems not good.

FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation of SQL execution.

Regards
Takayuki Tsunakawa

#166Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#165)
Re: Transactions involving multiple postgres foreign servers, take 2

At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

(FWIW, I think remote commits should be performed by backends, not by
another process, because backends should wait for all remote-commits
to end anyway and it is simpler. If we want to multiple remote-commits
in parallel, we could do that by adding some async-waiting interface.)

Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly find the English manual, so below is in Japanese. I quoted some text that got through machine translation, which appears a bit strange.)

https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm
--------------------------------------------------
Abandon timeout
Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phase commit transaction.

In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction until all resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attempt is made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transaction manager rolls back the transaction and releases the held lock on behalf of the destroyed transaction.
--------------------------------------------------

That's not a retry timeout but a timeout for total time of all
2nd-phase-commits. But I think it would be sufficient. Even if an
fdw could retry 2pc-commit, it's a matter of that fdw and the core has
nothing to do with.

Also, what if the user sets the statement timeout to 60 sec and they
want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
that client libraries of other DBMSs don't have asynchronous execution
functionality. If the SQL execution function is not interruptible, the
user will end up waiting for 60 sec, which seems not good.

I think fdw-2pc-commit can be interruptible safely as far as we run
the remote commits before entring critical section of local commit.

FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation of SQL execution.

At least postgres_fdw is interruptible while waiting the remote.

create view lt as select 1 as slp from (select pg_sleep(10)) t;
create foreign table ft(slp int) server sv1 options (table_name 'lt');
select * from ft;
^CCancel request sent
ERROR: canceling statement due to user request

regrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#167tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#166)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

I don't hate ERROR, but it would be simpler and understandable for the FDW commit routine to just return control to the caller (TM) and let TM do whatever is appropriate (asks the resolver to handle the failed commit, and continues to request next FDW to commit.)

https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm
--------------------------------------------------
Abandon timeout
Specifies the maximum time (in seconds) that the transaction manager

attempts to complete the second phase of a two-phase commit transaction.

In the second phase of a two-phase commit transaction, the transaction

manager attempts to complete the transaction until all resource managers
indicate that the transaction is complete. After the abort transaction timer
expires, no attempt is made to resolve the transaction. If the transaction enters
a ready state before it is destroyed, the transaction manager rolls back the
transaction and releases the held lock on behalf of the destroyed transaction.

--------------------------------------------------

That's not a retry timeout but a timeout for total time of all
2nd-phase-commits. But I think it would be sufficient. Even if an
fdw could retry 2pc-commit, it's a matter of that fdw and the core has
nothing to do with.

Yeah, the WebLogic documentation doesn't say whether it performs retries during the timeout period. I just cited as an example that has a timeout parameter for the second phase of 2PC.

At least postgres_fdw is interruptible while waiting the remote.

create view lt as select 1 as slp from (select pg_sleep(10)) t;
create foreign table ft(slp int) server sv1 options (table_name 'lt');
select * from ft;
^CCancel request sent
ERROR: canceling statement due to user request

I'm afraid the cancellation doesn't work while postgres_fdw is trying to connect to a down server. Also, Postgres manual doesn't say about cancellation, so we cannot expect FDWs to respond to user's cancel request.

Regards
Takayuki Tsunakawa

#168Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#165)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 9 Oct 2020 at 11:33, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

Well, I agree that it's not mandatory. I think it's better if the user
can choose.

I also doubt how useful the per-foreign-server timeout setting you
mentioned before. For example, suppose the transaction involves with
three foreign servers that have different timeout setting, what if the
backend failed to commit on the first one of the server due to
timeout? Does it attempt to commit on the other two servers? Or does
it give up and return the control to the client? In the former case,
what if the backend failed again on one of the other two servers due
to timeout? The backend might end up waiting for all timeouts and in
practice the user is not aware of how many servers are involved with
the transaction, for example in a sharding. So It seems to be hard to
predict the total timeout. In the latter case, the backend might
succeed to commit on the other two nodes. Also, the timeout setting of
the first foreign server virtually is used as the whole foreign
transaction resolution timeout. However, the user cannot control the
order of resolution. So again it seems to be hard for the user to
predict the timeout. So If we have a timeout mechanism, I think it's
better if the user can control the timeout for each transaction.
Probably the same is true for the retry.

Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly find the English manual, so below is in Japanese. I quoted some text that got through machine translation, which appears a bit strange.)

https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm
--------------------------------------------------
Abandon timeout
Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phase commit transaction.

In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction until all resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attempt is made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transaction manager rolls back the transaction and releases the held lock on behalf of the destroyed transaction.
--------------------------------------------------

Yeah per-transaction timeout for 2nd phase of 2PC seems a good idea.

Also, what if the user sets the statement timeout to 60 sec and they
want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
that client libraries of other DBMSs don't have asynchronous execution
functionality. If the SQL execution function is not interruptible, the
user will end up waiting for 60 sec, which seems not good.

FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation of SQL execution.

For example in postgres_fdw, it executes a SQL in asynchronous manner
using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on
(see do_sql_command() and pgfdw_get_result()). Therefore it the user
pressed ctl-C, the remote query would be canceled and raise an ERROR.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#169Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#166)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

So you're thinking the following sequence?

1. Prepare all foreign transactions.
2. Commit the all prepared foreign transactions.
3. Commit the local transaction.

Suppose we have the backend process call the commit routine, what if
one of FDW raises an ERROR during committing the foreign transaction
after committing other foreign transactions? The transaction will end
up with an abort but some foreign transactions are already committed.
Also, what if the backend process failed to commit the local
transaction? Since it already committed all foreign transactions it
cannot ensure the global atomicity in this case too. Therefore, I
think we should commit the distributed transactions in the following
sequence:

1. Prepare all foreign transactions.
2. Commit the local transaction.
3. Commit the all prepared foreign transactions.

But this is still not a perfect solution. If we have the backend
process call the commit routine and an error happens during executing
the commit routine of an FDW (i.g., at step 3) it's too late to report
an error to the client because we already committed the local
transaction. So the current solution is to have a background process
commit the foreign transactions so that the backend can just wait
without the possibility of errors.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#170tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#168)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

I also doubt how useful the per-foreign-server timeout setting you
mentioned before. For example, suppose the transaction involves with
three foreign servers that have different timeout setting, what if the
backend failed to commit on the first one of the server due to
timeout? Does it attempt to commit on the other two servers? Or does
it give up and return the control to the client? In the former case,
what if the backend failed again on one of the other two servers due
to timeout? The backend might end up waiting for all timeouts and in
practice the user is not aware of how many servers are involved with
the transaction, for example in a sharding. So It seems to be hard to
predict the total timeout. In the latter case, the backend might
succeed to commit on the other two nodes. Also, the timeout setting of
the first foreign server virtually is used as the whole foreign
transaction resolution timeout. However, the user cannot control the
order of resolution. So again it seems to be hard for the user to
predict the timeout. So If we have a timeout mechanism, I think it's
better if the user can control the timeout for each transaction.
Probably the same is true for the retry.

I agree that the user can control the timeout per transaction, not per FDW. I was just not sure if the Postgres core can define the timeout parameter and the FDWs can follow its setting. However, JTA defines a transaction timeout API (not commit timeout, though), and each RM can choose to implement them. So I think we can define the parameter and/or routines for the timeout in core likewise.

--------------------------------------------------
public interface javax.transaction.xa.XAResource

int getTransactionTimeout() throws XAException
This method returns the transaction timeout value set for this XAResourceinstance. If XAResource.
setTransactionTimeout was not use prior to invoking this method, the return value is the
default timeout set for the resource manager; otherwise, the value used in the previous setTransactionTimeoutcall
is returned.

Throws: XAException
An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL.

Returns:
The transaction timeout values in seconds.

boolean setTransactionTimeout(int seconds) throws XAException
This method sets the transaction timeout value for this XAResourceinstance. Once set, this timeout value
is effective until setTransactionTimeoutis invoked again with a different value. To reset the timeout
value to the default value used by the resource manager, set the value to zero.

If the timeout operation is performed successfully, the method returns true; otherwise false. If a resource
manager does not support transaction timeout value to be set explicitly, this method returns false.

Parameters:

seconds
An positive integer specifying the timout value in seconds. Zero resets the transaction timeout
value to the default one used by the resource manager. A negative value results in XAException
to be thrown with XAER_INVAL error code.

Returns:
true if transaction timeout value is set successfully; otherwise false.

Throws: XAException
An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL, or
XAER_INVAL.
--------------------------------------------------

For example in postgres_fdw, it executes a SQL in asynchronous manner
using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on
(see do_sql_command() and pgfdw_get_result()). Therefore it the user
pressed ctl-C, the remote query would be canceled and raise an ERROR.

Yeah, as I replied to Horiguchi-san, postgres_fdw can cancel queries. But postgres_fdw is not ready to cancel connection establishment, is it? At present, the user needs to set connect_timeout parameter on the foreign server to a reasonable short time so that it can respond quickly to cancellation requests. Alternately, we can modify postgres_fdw to use libpq's asynchronous connect functions.

Another issue is that the Postgres manual does not stipulate anything about cancellation of FDW processing. That's why I said that the current FDW does not support cancellation in general. Of course, I think we can stipulate the ability to cancel processing in the FDW interface.

Regards
Takayuki Tsunakawa

#171Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#170)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, 12 Oct 2020 at 11:08, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

I also doubt how useful the per-foreign-server timeout setting you
mentioned before. For example, suppose the transaction involves with
three foreign servers that have different timeout setting, what if the
backend failed to commit on the first one of the server due to
timeout? Does it attempt to commit on the other two servers? Or does
it give up and return the control to the client? In the former case,
what if the backend failed again on one of the other two servers due
to timeout? The backend might end up waiting for all timeouts and in
practice the user is not aware of how many servers are involved with
the transaction, for example in a sharding. So It seems to be hard to
predict the total timeout. In the latter case, the backend might
succeed to commit on the other two nodes. Also, the timeout setting of
the first foreign server virtually is used as the whole foreign
transaction resolution timeout. However, the user cannot control the
order of resolution. So again it seems to be hard for the user to
predict the timeout. So If we have a timeout mechanism, I think it's
better if the user can control the timeout for each transaction.
Probably the same is true for the retry.

I agree that the user can control the timeout per transaction, not per FDW. I was just not sure if the Postgres core can define the timeout parameter and the FDWs can follow its setting. However, JTA defines a transaction timeout API (not commit timeout, though), and each RM can choose to implement them. So I think we can define the parameter and/or routines for the timeout in core likewise.

I was thinking to have a GUC timeout parameter like statement_timeout.
The backend waits for the setting value when resolving foreign
transactions. But this idea seems different. FDW can set its timeout
via a transaction timeout API, is that right? But even if FDW can set
the timeout using a transaction timeout API, the problem that client
libraries for some DBMS don't support interruptible functions still
remains. The user can set a short time to the timeout but it also
leads to unnecessary timeouts. Thoughts?

--------------------------------------------------
public interface javax.transaction.xa.XAResource

int getTransactionTimeout() throws XAException
This method returns the transaction timeout value set for this XAResourceinstance. If XAResource.
setTransactionTimeout was not use prior to invoking this method, the return value is the
default timeout set for the resource manager; otherwise, the value used in the previous setTransactionTimeoutcall
is returned.

Throws: XAException
An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL.

Returns:
The transaction timeout values in seconds.

boolean setTransactionTimeout(int seconds) throws XAException
This method sets the transaction timeout value for this XAResourceinstance. Once set, this timeout value
is effective until setTransactionTimeoutis invoked again with a different value. To reset the timeout
value to the default value used by the resource manager, set the value to zero.

If the timeout operation is performed successfully, the method returns true; otherwise false. If a resource
manager does not support transaction timeout value to be set explicitly, this method returns false.

Parameters:

seconds
An positive integer specifying the timout value in seconds. Zero resets the transaction timeout
value to the default one used by the resource manager. A negative value results in XAException
to be thrown with XAER_INVAL error code.

Returns:
true if transaction timeout value is set successfully; otherwise false.

Throws: XAException
An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL, or
XAER_INVAL.
--------------------------------------------------

For example in postgres_fdw, it executes a SQL in asynchronous manner
using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on
(see do_sql_command() and pgfdw_get_result()). Therefore it the user
pressed ctl-C, the remote query would be canceled and raise an ERROR.

Yeah, as I replied to Horiguchi-san, postgres_fdw can cancel queries. But postgres_fdw is not ready to cancel connection establishment, is it? At present, the user needs to set connect_timeout parameter on the foreign server to a reasonable short time so that it can respond quickly to cancellation requests. Alternately, we can modify postgres_fdw to use libpq's asynchronous connect functions.

Yes, I think using asynchronous connect functions seems a good idea.

Another issue is that the Postgres manual does not stipulate anything about cancellation of FDW processing. That's why I said that the current FDW does not support cancellation in general. Of course, I think we can stipulate the ability to cancel processing in the FDW interface.

Yeah, it's the FDW developer responsibility to write the code to
execute the remote SQL that is interruptible. +1 for adding that to
the doc.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#172tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#171)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

I was thinking to have a GUC timeout parameter like statement_timeout.
The backend waits for the setting value when resolving foreign
transactions.

Me too.

But this idea seems different. FDW can set its timeout
via a transaction timeout API, is that right?

I'm not perfectly sure about how the TM( application server works) , but probably no. The TM has a configuration parameter for transaction timeout, and the TM calls XAResource.setTransactionTimeout() with that or smaller value for the argument.

But even if FDW can set
the timeout using a transaction timeout API, the problem that client
libraries for some DBMS don't support interruptible functions still
remains. The user can set a short time to the timeout but it also
leads to unnecessary timeouts. Thoughts?

Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client library doesn't support cancellation (e.g. doesn't respond to Ctrl+C or provide a function that cancel processing in pgorogss), then the Postgres user just finds that he can't cancel queries (just like we experienced with odbc_fdw.)

Regards
Takayuki Tsunakawa

#173Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#169)
Re: Transactions involving multiple postgres foreign servers, take 2

At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

So you're thinking the following sequence?

1. Prepare all foreign transactions.
2. Commit the all prepared foreign transactions.
3. Commit the local transaction.

Suppose we have the backend process call the commit routine, what if
one of FDW raises an ERROR during committing the foreign transaction
after committing other foreign transactions? The transaction will end
up with an abort but some foreign transactions are already committed.

Ok, I understand what you are aiming.

It is apparently out of the focus of the two-phase commit
protocol. Each FDW server can try to keep the contract as far as its
ability reaches, but in the end such kind of failure is
inevitable. Even if we require FDW developers not to respond until a
2pc-commit succeeds, that just leads the whole FDW-cluster to freeze
even not in an extremely bad case.

We have no other choices than shutting the server down (then the
succeeding server start removes the garbage commits) or continueing
working leaving some information in a system storage (or reverting the
garbage commits). What we can do in that case is to provide a
automated way to resolve the inconsistency.

Also, what if the backend process failed to commit the local
transaction? Since it already committed all foreign transactions it
cannot ensure the global atomicity in this case too. Therefore, I
think we should commit the distributed transactions in the following
sequence:

Ditto. It's out of the range of 2pc. Using p2c for local transaction
could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc
could reduce the probability but can't elimite failure cases.

1. Prepare all foreign transactions.
2. Commit the local transaction.
3. Commit the all prepared foreign transactions.

But this is still not a perfect solution. If we have the backend

2pc is not a perfect solution in the first place. Attaching a similar
phase to it cannot make it "perfect".

process call the commit routine and an error happens during executing
the commit routine of an FDW (i.g., at step 3) it's too late to report
an error to the client because we already committed the local
transaction. So the current solution is to have a background process
commit the foreign transactions so that the backend can just wait
without the possibility of errors.

Whatever process tries to complete a transaction, the client must wait
for the transaction to end and anyway that's just a freeze in the
client's view, unless you intended to respond to local commit before
all participant complete.

I don't think most of client applications wouldn't wait for frozen
server forever. We have the same issue at the time the client decided
to give up the transacton, or the leader session is killed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#174Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#173)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

So you're thinking the following sequence?

1. Prepare all foreign transactions.
2. Commit the all prepared foreign transactions.
3. Commit the local transaction.

Suppose we have the backend process call the commit routine, what if
one of FDW raises an ERROR during committing the foreign transaction
after committing other foreign transactions? The transaction will end
up with an abort but some foreign transactions are already committed.

Ok, I understand what you are aiming.

It is apparently out of the focus of the two-phase commit
protocol. Each FDW server can try to keep the contract as far as its
ability reaches, but in the end such kind of failure is
inevitable. Even if we require FDW developers not to respond until a
2pc-commit succeeds, that just leads the whole FDW-cluster to freeze
even not in an extremely bad case.

We have no other choices than shutting the server down (then the
succeeding server start removes the garbage commits) or continueing
working leaving some information in a system storage (or reverting the
garbage commits). What we can do in that case is to provide a
automated way to resolve the inconsistency.

Also, what if the backend process failed to commit the local
transaction? Since it already committed all foreign transactions it
cannot ensure the global atomicity in this case too. Therefore, I
think we should commit the distributed transactions in the following
sequence:

Ditto. It's out of the range of 2pc. Using p2c for local transaction
could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc
could reduce the probability but can't elimite failure cases.

IMO the problems I mentioned arise from the fact that the above
sequence doesn't really follow the 2pc protocol in the first place.

We can think of the fact that we commit the local transaction without
preparation while preparing foreign transactions as that we’re using
the 2pc with last resource transaction optimization (or last agent
optimization)[1]https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html. That is, we prepare all foreign transactions first
and the local node is always the last resource to process. At this
time, the outcome of the distributed transaction completely depends on
the fate of the last resource (i.g., the local transaction). If it
fails, the distributed transaction must be abort by rolling back
prepared foreign transactions. OTOH, if it succeeds, all prepared
foreign transaction must be committed. Therefore, we don’t need to
prepare the last resource and can commit it. In this way, if we want
to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

1. Prepare all foreign transactions.
2. Commit the local transaction.
3. Commit the all prepared foreign transactions.

But this is still not a perfect solution. If we have the backend

2pc is not a perfect solution in the first place. Attaching a similar
phase to it cannot make it "perfect".

process call the commit routine and an error happens during executing
the commit routine of an FDW (i.g., at step 3) it's too late to report
an error to the client because we already committed the local
transaction. So the current solution is to have a background process
commit the foreign transactions so that the backend can just wait
without the possibility of errors.

Whatever process tries to complete a transaction, the client must wait
for the transaction to end and anyway that's just a freeze in the
client's view, unless you intended to respond to local commit before
all participant complete.

Yes, but the point of using a separate process is that even if FDW
code raises an error, the client wanting for transaction resolution
doesn't get it and it's interruptible.

[1]: https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#175Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#174)
Re: Transactions involving multiple postgres foreign servers, take 2

At Tue, 13 Oct 2020 11:56:51 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

So you're thinking the following sequence?

1. Prepare all foreign transactions.
2. Commit the all prepared foreign transactions.
3. Commit the local transaction.

Suppose we have the backend process call the commit routine, what if
one of FDW raises an ERROR during committing the foreign transaction
after committing other foreign transactions? The transaction will end
up with an abort but some foreign transactions are already committed.

Ok, I understand what you are aiming.

It is apparently out of the focus of the two-phase commit
protocol. Each FDW server can try to keep the contract as far as its
ability reaches, but in the end such kind of failure is
inevitable. Even if we require FDW developers not to respond until a
2pc-commit succeeds, that just leads the whole FDW-cluster to freeze
even not in an extremely bad case.

We have no other choices than shutting the server down (then the
succeeding server start removes the garbage commits) or continueing
working leaving some information in a system storage (or reverting the
garbage commits). What we can do in that case is to provide a
automated way to resolve the inconsistency.

Also, what if the backend process failed to commit the local
transaction? Since it already committed all foreign transactions it
cannot ensure the global atomicity in this case too. Therefore, I
think we should commit the distributed transactions in the following
sequence:

Ditto. It's out of the range of 2pc. Using p2c for local transaction
could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc
could reduce the probability but can't elimite failure cases.

IMO the problems I mentioned arise from the fact that the above
sequence doesn't really follow the 2pc protocol in the first place.

We can think of the fact that we commit the local transaction without
preparation while preparing foreign transactions as that we’re using
the 2pc with last resource transaction optimization (or last agent
optimization)[1]. That is, we prepare all foreign transactions first
and the local node is always the last resource to process. At this
time, the outcome of the distributed transaction completely depends on
the fate of the last resource (i.g., the local transaction). If it
fails, the distributed transaction must be abort by rolling back
prepared foreign transactions. OTOH, if it succeeds, all prepared
foreign transaction must be committed. Therefore, we don’t need to
prepare the last resource and can commit it. In this way, if we want

There are cases of commit-failure of a local transaction caused by
too-many notifications or by serialization failure.

to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

True. But I haven't suggested that sequence.

1. Prepare all foreign transactions.
2. Commit the local transaction.
3. Commit the all prepared foreign transactions.

But this is still not a perfect solution. If we have the backend

2pc is not a perfect solution in the first place. Attaching a similar
phase to it cannot make it "perfect".

process call the commit routine and an error happens during executing
the commit routine of an FDW (i.g., at step 3) it's too late to report
an error to the client because we already committed the local
transaction. So the current solution is to have a background process
commit the foreign transactions so that the backend can just wait
without the possibility of errors.

Whatever process tries to complete a transaction, the client must wait
for the transaction to end and anyway that's just a freeze in the
client's view, unless you intended to respond to local commit before
all participant complete.

Yes, but the point of using a separate process is that even if FDW
code raises an error, the client wanting for transaction resolution
doesn't get it and it's interruptible.

[1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html

I don't get the point. If FDW-commit is called on the same process, an
error from FDW-commit outright leads to the failure of the current
commit. Isn't "the client wanting for transaction resolution" the
client of the leader process of the 2pc-commit in the same-process
model?

I should missing something, but postgres_fdw allows query cancelation
at commit time. (But I think it is depends on timing whether the
remote commit is completed or aborted.). Perhaps the feature was
introduced after the project started?

commit ae9bfc5d65123aaa0d1cca9988037489760bdeae
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Jun 7 15:14:55 2017 -0400

postgres_fdw: Allow cancellation of transaction control commands.

I thought that we are discussing on fdw-errors during the 2pc-commit
phase.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#176Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#175)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Tue, 13 Oct 2020 11:56:51 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

So you're thinking the following sequence?

1. Prepare all foreign transactions.
2. Commit the all prepared foreign transactions.
3. Commit the local transaction.

Suppose we have the backend process call the commit routine, what if
one of FDW raises an ERROR during committing the foreign transaction
after committing other foreign transactions? The transaction will end
up with an abort but some foreign transactions are already committed.

Ok, I understand what you are aiming.

It is apparently out of the focus of the two-phase commit
protocol. Each FDW server can try to keep the contract as far as its
ability reaches, but in the end such kind of failure is
inevitable. Even if we require FDW developers not to respond until a
2pc-commit succeeds, that just leads the whole FDW-cluster to freeze
even not in an extremely bad case.

We have no other choices than shutting the server down (then the
succeeding server start removes the garbage commits) or continueing
working leaving some information in a system storage (or reverting the
garbage commits). What we can do in that case is to provide a
automated way to resolve the inconsistency.

Also, what if the backend process failed to commit the local
transaction? Since it already committed all foreign transactions it
cannot ensure the global atomicity in this case too. Therefore, I
think we should commit the distributed transactions in the following
sequence:

Ditto. It's out of the range of 2pc. Using p2c for local transaction
could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc
could reduce the probability but can't elimite failure cases.

IMO the problems I mentioned arise from the fact that the above
sequence doesn't really follow the 2pc protocol in the first place.

We can think of the fact that we commit the local transaction without
preparation while preparing foreign transactions as that we’re using
the 2pc with last resource transaction optimization (or last agent
optimization)[1]. That is, we prepare all foreign transactions first
and the local node is always the last resource to process. At this
time, the outcome of the distributed transaction completely depends on
the fate of the last resource (i.g., the local transaction). If it
fails, the distributed transaction must be abort by rolling back
prepared foreign transactions. OTOH, if it succeeds, all prepared
foreign transaction must be committed. Therefore, we don’t need to
prepare the last resource and can commit it. In this way, if we want

There are cases of commit-failure of a local transaction caused by
too-many notifications or by serialization failure.

Yes, even if that happens we are still able to rollback all foreign
transactions.

to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

True. But I haven't suggested that sequence.

Okay, I might have missed your point. Could you elaborate on the idea
you mentioned before, "I think remote-commits should be performed
before local commit passes the point-of-no-return"?

1. Prepare all foreign transactions.
2. Commit the local transaction.
3. Commit the all prepared foreign transactions.

But this is still not a perfect solution. If we have the backend

2pc is not a perfect solution in the first place. Attaching a similar
phase to it cannot make it "perfect".

process call the commit routine and an error happens during executing
the commit routine of an FDW (i.g., at step 3) it's too late to report
an error to the client because we already committed the local
transaction. So the current solution is to have a background process
commit the foreign transactions so that the backend can just wait
without the possibility of errors.

Whatever process tries to complete a transaction, the client must wait
for the transaction to end and anyway that's just a freeze in the
client's view, unless you intended to respond to local commit before
all participant complete.

Yes, but the point of using a separate process is that even if FDW
code raises an error, the client wanting for transaction resolution
doesn't get it and it's interruptible.

[1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html

I don't get the point. If FDW-commit is called on the same process, an
error from FDW-commit outright leads to the failure of the current
commit. Isn't "the client wanting for transaction resolution" the
client of the leader process of the 2pc-commit in the same-process
model?

I should missing something, but postgres_fdw allows query cancelation
at commit time. (But I think it is depends on timing whether the
remote commit is completed or aborted.). Perhaps the feature was
introduced after the project started?

commit ae9bfc5d65123aaa0d1cca9988037489760bdeae
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Jun 7 15:14:55 2017 -0400

postgres_fdw: Allow cancellation of transaction control commands.

I thought that we are discussing on fdw-errors during the 2pc-commit
phase.

Yes, I'm also discussing on fdw-errors during the 2pc-commit phase
that happens after committing the local transaction.

Even if FDW-commit raises an error due to the user's cancel request or
whatever reason during committing the prepared foreign transactions,
it's too late. The client will get an error like "ERROR: canceling
statement due to user request" and would think the transaction is
aborted but it's not true, the local transaction is already committed.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#177Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#176)
Re: Transactions involving multiple postgres foreign servers, take 2

At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failure of a local transaction caused by

too-many notifications or by serialization failure.

Yes, even if that happens we are still able to rollback all foreign
transactions.

Mmm. I'm confused. If this is about 2pc-commit-request(or prepare)
phase, we can rollback the remote transactions. But I think we're
focusing 2pc-commit phase. remote transaction that has already
2pc-committed, they can be no longer rollback'ed.

to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

True. But I haven't suggested that sequence.

Okay, I might have missed your point. Could you elaborate on the idea
you mentioned before, "I think remote-commits should be performed
before local commit passes the point-of-no-return"?

It is simply the condition that we can ERROR-out from
CommitTransaction. I thought that when you say like "we cannot
ERROR-out" you meant "since that is raised to FATAL", but it seems to
me that both of you are looking another aspect.

If the aspect is "what to do complete the all-prepared p2c transaction
at all costs", I'd say "there's a fundamental limitaion". Although
I'm not sure what you mean exactly by prohibiting errors from fdw
routines , if that meant "the API can fail, but must not raise an
exception", that policy is enforced by setting a critical
section. However, if it were "the API mustn't fail", that cannot be
realized, I believe.

I thought that we are discussing on fdw-errors during the 2pc-commit
phase.

Yes, I'm also discussing on fdw-errors during the 2pc-commit phase
that happens after committing the local transaction.

Even if FDW-commit raises an error due to the user's cancel request or
whatever reason during committing the prepared foreign transactions,
it's too late. The client will get an error like "ERROR: canceling
statement due to user request" and would think the transaction is
aborted but it's not true, the local transaction is already committed.

By the way I found that I misread the patch. in v26-0002,
AtEOXact_FdwXact() is actually called after the
point-of-no-return. What is the reason for the place? We can
error-out before changing the state to TRANS_COMMIT.

And if any of the remotes ended with 2pc-commit (not prepare phase)
failure, consistency of the commit is no longer guaranteed so we have
no choice other than shutting down the server, or continuing running
allowing the incosistency. What do we want in that case?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#178Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#177)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 14 Oct 2020 at 13:19, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failure of a local transaction caused by

too-many notifications or by serialization failure.

Yes, even if that happens we are still able to rollback all foreign
transactions.

Mmm. I'm confused. If this is about 2pc-commit-request(or prepare)
phase, we can rollback the remote transactions. But I think we're
focusing 2pc-commit phase. remote transaction that has already
2pc-committed, they can be no longer rollback'ed.

Did you mention a failure of local commit, right? With the current
approach, we prepare all foreign transactions first and then commit
the local transaction. After committing the local transaction we
commit the prepared foreign transactions. So suppose a serialization
failure happens during committing the local transaction, we still are
able to roll back foreign transactions. The check of serialization
failure of the foreign transactions has already been done at the
prepare phase.

to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

True. But I haven't suggested that sequence.

Okay, I might have missed your point. Could you elaborate on the idea
you mentioned before, "I think remote-commits should be performed
before local commit passes the point-of-no-return"?

It is simply the condition that we can ERROR-out from
CommitTransaction. I thought that when you say like "we cannot
ERROR-out" you meant "since that is raised to FATAL", but it seems to
me that both of you are looking another aspect.

If the aspect is "what to do complete the all-prepared p2c transaction
at all costs", I'd say "there's a fundamental limitaion". Although
I'm not sure what you mean exactly by prohibiting errors from fdw
routines , if that meant "the API can fail, but must not raise an
exception", that policy is enforced by setting a critical
section. However, if it were "the API mustn't fail", that cannot be
realized, I believe.

When I say "we cannot error-out" it means it's too late. What I'd like
to prevent is that the backend process returns an error to the client
after committing the local transaction. Because it will mislead the
user.

I thought that we are discussing on fdw-errors during the 2pc-commit
phase.

Yes, I'm also discussing on fdw-errors during the 2pc-commit phase
that happens after committing the local transaction.

Even if FDW-commit raises an error due to the user's cancel request or
whatever reason during committing the prepared foreign transactions,
it's too late. The client will get an error like "ERROR: canceling
statement due to user request" and would think the transaction is
aborted but it's not true, the local transaction is already committed.

By the way I found that I misread the patch. in v26-0002,
AtEOXact_FdwXact() is actually called after the
point-of-no-return. What is the reason for the place? We can
error-out before changing the state to TRANS_COMMIT.

Are you referring to
v26-0002-Introduce-transaction-manager-for-foreign-transa.patch? If
so, the patch doesn't implement 2pc. I think we can commit the foreign
transaction before changing the state to TRANS_COMMIT but in any case
it cannot ensure atomic commit. It just adds both commit and rollback
transaction APIs so that FDW can control transactions by using these
API, not by XactCallback.

And if any of the remotes ended with 2pc-commit (not prepare phase)
failure, consistency of the commit is no longer guaranteed so we have
no choice other than shutting down the server, or continuing running
allowing the incosistency. What do we want in that case?

I think it depends on the failure. If 2pc-commit failed due to network
connection failure or the server crash, we would need to try again
later. We normally expect the prepared transaction is able to be
committed with no issue but in case it could not, I think we can leave
the choice for the user: resolve it manually after recovered, give up
etc.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#179Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#178)
Re: Transactions involving multiple postgres foreign servers, take 2

(v26 fails on the current master)

At Wed, 14 Oct 2020 13:52:49 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Wed, 14 Oct 2020 at 13:19, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failure of a local transaction caused by

too-many notifications or by serialization failure.

Yes, even if that happens we are still able to rollback all foreign
transactions.

Mmm. I'm confused. If this is about 2pc-commit-request(or prepare)
phase, we can rollback the remote transactions. But I think we're
focusing 2pc-commit phase. remote transaction that has already
2pc-committed, they can be no longer rollback'ed.

Did you mention a failure of local commit, right? With the current
approach, we prepare all foreign transactions first and then commit
the local transaction. After committing the local transaction we
commit the prepared foreign transactions. So suppose a serialization
failure happens during committing the local transaction, we still are
able to roll back foreign transactions. The check of serialization
failure of the foreign transactions has already been done at the
prepare phase.

Understood.

to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

True. But I haven't suggested that sequence.

Okay, I might have missed your point. Could you elaborate on the idea
you mentioned before, "I think remote-commits should be performed
before local commit passes the point-of-no-return"?

It is simply the condition that we can ERROR-out from
CommitTransaction. I thought that when you say like "we cannot
ERROR-out" you meant "since that is raised to FATAL", but it seems to
me that both of you are looking another aspect.

If the aspect is "what to do complete the all-prepared p2c transaction
at all costs", I'd say "there's a fundamental limitaion". Although
I'm not sure what you mean exactly by prohibiting errors from fdw
routines , if that meant "the API can fail, but must not raise an
exception", that policy is enforced by setting a critical
section. However, if it were "the API mustn't fail", that cannot be
realized, I believe.

When I say "we cannot error-out" it means it's too late. What I'd like
to prevent is that the backend process returns an error to the client
after committing the local transaction. Because it will mislead the
user.

Anyway we don't do anything that can fail after changing state to
TRANS_COMMIT. So we cannot run fdw-2pc-commit after that since it
cannot be failure-proof. if we do them before the point we cannot
ERROR-out after local commit completes.

I thought that we are discussing on fdw-errors during the 2pc-commit
phase.

Yes, I'm also discussing on fdw-errors during the 2pc-commit phase
that happens after committing the local transaction.

Even if FDW-commit raises an error due to the user's cancel request or
whatever reason during committing the prepared foreign transactions,
it's too late. The client will get an error like "ERROR: canceling
statement due to user request" and would think the transaction is
aborted but it's not true, the local transaction is already committed.

By the way I found that I misread the patch. in v26-0002,
AtEOXact_FdwXact() is actually called after the
point-of-no-return. What is the reason for the place? We can
error-out before changing the state to TRANS_COMMIT.

Are you referring to
v26-0002-Introduce-transaction-manager-for-foreign-transa.patch? If
so, the patch doesn't implement 2pc. I think we can commit the foreign

Ah, I guessed that the trigger points of PREPARE and COMMIT that are
inserted by 0002 won't be moved by the following patches. So the
direction of my discussion doesn't change by the fact.

transaction before changing the state to TRANS_COMMIT but in any case
it cannot ensure atomic commit. It just adds both commit and rollback

I guess that you have the local-commit-failure case in mind? Couldn't
we internally prepare the local transaction then following the correct
p2c protocol involving the local transaction? (I'm looking v26-0008)

transaction APIs so that FDW can control transactions by using these
API, not by XactCallback.

And if any of the remotes ended with 2pc-commit (not prepare phase)
failure, consistency of the commit is no longer guaranteed so we have
no choice other than shutting down the server, or continuing running
allowing the incosistency. What do we want in that case?

I think it depends on the failure. If 2pc-commit failed due to network
connection failure or the server crash, we would need to try again
later. We normally expect the prepared transaction is able to be
committed with no issue but in case it could not, I think we can leave
the choice for the user: resolve it manually after recovered, give up
etc.

Understood.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#180Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#179)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 14 Oct 2020 at 17:11, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

(v26 fails on the current master)

Thanks, I'll update the patch.

At Wed, 14 Oct 2020 13:52:49 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Wed, 14 Oct 2020 at 13:19, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failure of a local transaction caused by

too-many notifications or by serialization failure.

Yes, even if that happens we are still able to rollback all foreign
transactions.

Mmm. I'm confused. If this is about 2pc-commit-request(or prepare)
phase, we can rollback the remote transactions. But I think we're
focusing 2pc-commit phase. remote transaction that has already
2pc-committed, they can be no longer rollback'ed.

Did you mention a failure of local commit, right? With the current
approach, we prepare all foreign transactions first and then commit
the local transaction. After committing the local transaction we
commit the prepared foreign transactions. So suppose a serialization
failure happens during committing the local transaction, we still are
able to roll back foreign transactions. The check of serialization
failure of the foreign transactions has already been done at the
prepare phase.

Understood.

to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

True. But I haven't suggested that sequence.

Okay, I might have missed your point. Could you elaborate on the idea
you mentioned before, "I think remote-commits should be performed
before local commit passes the point-of-no-return"?

It is simply the condition that we can ERROR-out from
CommitTransaction. I thought that when you say like "we cannot
ERROR-out" you meant "since that is raised to FATAL", but it seems to
me that both of you are looking another aspect.

If the aspect is "what to do complete the all-prepared p2c transaction
at all costs", I'd say "there's a fundamental limitaion". Although
I'm not sure what you mean exactly by prohibiting errors from fdw
routines , if that meant "the API can fail, but must not raise an
exception", that policy is enforced by setting a critical
section. However, if it were "the API mustn't fail", that cannot be
realized, I believe.

When I say "we cannot error-out" it means it's too late. What I'd like
to prevent is that the backend process returns an error to the client
after committing the local transaction. Because it will mislead the
user.

Anyway we don't do anything that can fail after changing state to
TRANS_COMMIT. So we cannot run fdw-2pc-commit after that since it
cannot be failure-proof. if we do them before the point we cannot
ERROR-out after local commit completes.

I thought that we are discussing on fdw-errors during the 2pc-commit
phase.

Yes, I'm also discussing on fdw-errors during the 2pc-commit phase
that happens after committing the local transaction.

Even if FDW-commit raises an error due to the user's cancel request or
whatever reason during committing the prepared foreign transactions,
it's too late. The client will get an error like "ERROR: canceling
statement due to user request" and would think the transaction is
aborted but it's not true, the local transaction is already committed.

By the way I found that I misread the patch. in v26-0002,
AtEOXact_FdwXact() is actually called after the
point-of-no-return. What is the reason for the place? We can
error-out before changing the state to TRANS_COMMIT.

Are you referring to
v26-0002-Introduce-transaction-manager-for-foreign-transa.patch? If
so, the patch doesn't implement 2pc. I think we can commit the foreign

Ah, I guessed that the trigger points of PREPARE and COMMIT that are
inserted by 0002 won't be moved by the following patches. So the
direction of my discussion doesn't change by the fact.

transaction before changing the state to TRANS_COMMIT but in any case
it cannot ensure atomic commit. It just adds both commit and rollback

I guess that you have the local-commit-failure case in mind? Couldn't
we internally prepare the local transaction then following the correct
p2c protocol involving the local transaction? (I'm looking v26-0008)

Yes, we could. But as I mentioned before if we always commit the local
transaction last, we don't necessarily need to prepare the local
transaction. If we prepared the local transaction, I think we would be
able to allow FDW's commit routine to raise an error even during
2pc-commit, but only for the first time. Once we committed any one of
the involved transactions including the local transaction and foreign
transactions, the commit routine must not raise an error during
2pc-commit for the same reason; it's too late.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#181Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#172)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, 12 Oct 2020 at 17:19, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

I was thinking to have a GUC timeout parameter like statement_timeout.
The backend waits for the setting value when resolving foreign
transactions.

Me too.

But this idea seems different. FDW can set its timeout
via a transaction timeout API, is that right?

I'm not perfectly sure about how the TM( application server works) , but probably no. The TM has a configuration parameter for transaction timeout, and the TM calls XAResource.setTransactionTimeout() with that or smaller value for the argument.

But even if FDW can set
the timeout using a transaction timeout API, the problem that client
libraries for some DBMS don't support interruptible functions still
remains. The user can set a short time to the timeout but it also
leads to unnecessary timeouts. Thoughts?

Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client library doesn't support cancellation (e.g. doesn't respond to Ctrl+C or provide a function that cancel processing in pgorogss), then the Postgres user just finds that he can't cancel queries (just like we experienced with odbc_fdw.)

So the idea of using another process to commit prepared foreign
transactions seems better also in terms of this point. Even if a DBMS
client library doesn’t support query cancellation, the transaction
commit can return the control to the client when the user press ctl-c
as the backend process is just sleeping using WaitLatch() (it’s
similar to synchronous replication)

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#182tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#181)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client

library doesn't support cancellation (e.g. doesn't respond to Ctrl+C or provide a
function that cancel processing in pgorogss), then the Postgres user just finds
that he can't cancel queries (just like we experienced with odbc_fdw.)

So the idea of using another process to commit prepared foreign
transactions seems better also in terms of this point. Even if a DBMS
client library doesn’t support query cancellation, the transaction
commit can return the control to the client when the user press ctl-c
as the backend process is just sleeping using WaitLatch() (it’s
similar to synchronous replication)

I have to say that's nitpicking. I believe almost nobody does, or cares about, canceling commits, at the expense of impractical performance due to non-parallelism, serial execution in each resolver, and context switches.

Also, FDW is not cancellable in general. It makes no sense to care only about commit.

(Fortunately, postgres_fdw is cancellable in any way.)

Regards
Takayuki Tsunakawa

#183Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#182)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, 19 Oct 2020 at 14:39, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client

library doesn't support cancellation (e.g. doesn't respond to Ctrl+C or provide a
function that cancel processing in pgorogss), then the Postgres user just finds
that he can't cancel queries (just like we experienced with odbc_fdw.)

So the idea of using another process to commit prepared foreign
transactions seems better also in terms of this point. Even if a DBMS
client library doesn’t support query cancellation, the transaction
commit can return the control to the client when the user press ctl-c
as the backend process is just sleeping using WaitLatch() (it’s
similar to synchronous replication)

I have to say that's nitpicking. I believe almost nobody does, or cares about, canceling commits,

Really? I don’t think so. I think It’s terrible that the query gets
stuck for a long time and we cannot do anything than waiting until a
crashed foreign server is restored. We can have a timeout but I don’t
think every user wants to use the timeout or the user might want to
set a timeout to a relatively large value by the concern of
misdetection. I guess synchronous replication had similar concerns so
it has a similar mechanism.

at the expense of impractical performance due to non-parallelism, serial execution in each resolver, and context switches.

I have never said that we’re going to live with serial execution in
each resolver and non-parallelism. I've been repeatedly saying that it
would be possible that we improve this feature over the releases to
get a good performance even if we use a separate background process.
Using a background process to commit is the only option to support
interruptible foreign transaction resolution for now whereas there are
some ideas for performance improvements. I think we don't have enough
discussion on how we can improve the idea of using a separate process
and how much performance will improve and how possible it is. It's not
late to reject that idea after the discussion.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#184tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#183)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

On Mon, 19 Oct 2020 at 14:39, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

I have to say that's nitpicking. I believe almost nobody does, or cares about,

canceling commits,

Really? I don’t think so. I think It’s terrible that the query gets
stuck for a long time and we cannot do anything than waiting until a
crashed foreign server is restored. We can have a timeout but I don’t
think every user wants to use the timeout or the user might want to
set a timeout to a relatively large value by the concern of
misdetection. I guess synchronous replication had similar concerns so
it has a similar mechanism.

Really. I thought we were talking about canceling commits with Ctrl + C as you referred, right? I couldn't imagine, in production environments where many sessions are running transactions concurrently, how the user (DBA) wants and can cancel each stuck session during commit one by one with Ctrl + C by hand. I haven't seen such a feature exist or been considered crucial that enables the user (administrator) to cancel running processing with Ctrl + C from the side.

Rather, setting appropriate timeout is the current sound system design , isn't it? It spans many areas - TCP/IP, heartbeats of load balancers and clustering software, request and response to application servers and database servers, etc. I sympathize with your concern that users may not be confident about their settings. But that's the current practice unfortunately.

at the expense of impractical performance due to non-parallelism, serial

execution in each resolver, and context switches.

I have never said that we’re going to live with serial execution in
each resolver and non-parallelism. I've been repeatedly saying that it
would be possible that we improve this feature over the releases to
get a good performance even if we use a separate background process.

IIRC, I haven't seen a reasonable design based on a separate process that handles commits during normal operation. What I heard is to launch as many resolvers as the client sessions, but that consumes too much resource as I said.

Using a background process to commit is the only option to support
interruptible foreign transaction resolution for now whereas there are
some ideas for performance improvements.

A practical solution is the timeout for the FDW in general, as in application servers. postgres_fdw can benefit from Ctrl + C as well.

I think we don't have enough
discussion on how we can improve the idea of using a separate process
and how much performance will improve and how possible it is. It's not
late to reject that idea after the discussion.

Yeah, I agree that discussion is not enough yet. In other words, the design has not reached the quality for the first release yet. We should try to avoid using "Hopefully, we should be able to improve in the next release (I haven't seen the design in light, though)" as an excuse for getting a half-baked patch committed that does not offer practical quality. I saw many developers' patches were rejected because of insufficient performance, e.g. even 0.8% performance impact. (I'm one of those developers, actually...) I have been feeling this community is rigorous about performance. We have to be sincere.

Regards
Takayuki Tsunakawa

#185Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#184)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, Oct 19, 2020 at 2:37 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Really. I thought we were talking about canceling commits with Ctrl + C as you referred, right? I couldn't imagine, in production environments where many sessions are running transactions concurrently, how the user (DBA) wants and can cancel each stuck session during commit one by one with Ctrl + C by hand. I haven't seen such a feature exist or been considered crucial that enables the user (administrator) to cancel running processing with Ctrl + C from the side.

Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel
running query from any backend or terminate a backend. For either to
work the backend needs to be interruptible. IIRC, Robert had made an
effort to make postgres_fdw interruptible few years back.

--
Best Wishes,
Ashutosh Bapat

#186Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Ashutosh Bapat (#185)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, 19 Oct 2020 at 20:37, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Mon, Oct 19, 2020 at 2:37 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Really. I thought we were talking about canceling commits with Ctrl + C as you referred, right? I couldn't imagine, in production environments where many sessions are running transactions concurrently, how the user (DBA) wants and can cancel each stuck session during commit one by one with Ctrl + C by hand. I haven't seen such a feature exist or been considered crucial that enables the user (administrator) to cancel running processing with Ctrl + C from the side.

Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel
running query from any backend or terminate a backend. For either to
work the backend needs to be interruptible. IIRC, Robert had made an
effort to make postgres_fdw interruptible few years back.

Right. Also, We discussed having a timeout on the core side but I'm
concerned that the timeout also might not work if it's not
interruptible.

While using the timeout is a good idea, I have to think there is also
a certain number of the user who doesn't use this timeout as there is
a certain number of the users who doesn't use timeouts such as
statement_timeout. We must not ignore such users and It might not be
advisable to design a feature that ignores such users.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#187tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Ashutosh Bapat (#185)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel
running query from any backend or terminate a backend. For either to
work the backend needs to be interruptible. IIRC, Robert had made an
effort to make postgres_fdw interruptible few years back.

Yeah, I know those functions. Sawada-san was talking about Ctrl + C, so I responded accordingly.

Also, how can the DBA find sessions to run those functions against? Can he tell if a session is connected to or running SQL to a given foreign server? Can he terminate or cancel all session with one SQL command that are stuck in accessing a particular foreign server?

Furthermore, FDW is not cancellable in general. So, I don't see a point in trying hard to make only commit be cancelable.

Regards
Takayuki Tsunakawa

#188Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#187)
Re: Transactions involving multiple postgres foreign servers, take 2

At Tue, 20 Oct 2020 02:44:09 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel
running query from any backend or terminate a backend. For either to
work the backend needs to be interruptible. IIRC, Robert had made an
effort to make postgres_fdw interruptible few years back.

Yeah, I know those functions. Sawada-san was talking about Ctrl + C, so I responded accordingly.

Also, how can the DBA find sessions to run those functions against? Can he tell if a session is connected to or running SQL to a given foreign server? Can he terminate or cancel all session with one SQL command that are stuck in accessing a particular foreign server?

I don't think the inability to cancel all session at once cannot be a
reason not to not to allow operators to cancel a stuck session.

Furthermore, FDW is not cancellable in general. So, I don't see a point in trying hard to make only commit be cancelable.

I think that it is quite important that operators can cancel any
process that has been stuck for a long time. Furthermore, postgres_fdw
is more likely to be stuck since network is involved so the usefulness
of that feature would be higher.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#189tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#188)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

I don't think the inability to cancel all session at once cannot be a
reason not to not to allow operators to cancel a stuck session.

Yeah, I didn't mean to discount the ability to cancel queries. I just want to confirm how the user can use the cancellation in practice. I didn't see how the user can use the cancellation in the FDW framework, so I asked about it. We have to think about the user's context if we regard canceling commits as important.

Furthermore, FDW is not cancellable in general. So, I don't see a point in

trying hard to make only commit be cancelable.

I think that it is quite important that operators can cancel any
process that has been stuck for a long time. Furthermore, postgres_fdw
is more likely to be stuck since network is involved so the usefulness
of that feature would be higher.

But lower than practical performance during normal operation.

BTW, speaking of network, how can postgres_fdw respond quickly to cancel request when libpq is waiting for a reply from a down foreign server? Can the user continue to use that session after cancellation?

Regards
Takayuki Tsunakawa

#190Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#189)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 20 Oct 2020 at 13:23, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

I don't think the inability to cancel all session at once cannot be a
reason not to not to allow operators to cancel a stuck session.

Yeah, I didn't mean to discount the ability to cancel queries. I just want to confirm how the user can use the cancellation in practice. I didn't see how the user can use the cancellation in the FDW framework, so I asked about it. We have to think about the user's context if we regard canceling commits as important.

I think it doesn't matter whether in FDW framework or not. The user
normally doesn't care which backend processes connecting to foreign
servers. They will attempt to cancel the query like always if they
realized that a backend gets stuck. There are surely plenty of users
who use query cancellation.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#191Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#190)
Re: Transactions involving multiple postgres foreign servers, take 2

At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Tue, 20 Oct 2020 at 13:23, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

I don't think the inability to cancel all session at once cannot be a
reason not to not to allow operators to cancel a stuck session.

Yeah, I didn't mean to discount the ability to cancel queries. I just want to confirm how the user can use the cancellation in practice. I didn't see how the user can use the cancellation in the FDW framework, so I asked about it. We have to think about the user's context if we regard canceling commits as important.

I think it doesn't matter whether in FDW framework or not. The user
normally doesn't care which backend processes connecting to foreign
servers. They will attempt to cancel the query like always if they
realized that a backend gets stuck. There are surely plenty of users
who use query cancellation.

The most serious impact from inability of canceling a query on a
certain session is that server-restart is required to end such a
session.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#192Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#189)
Re: Transactions involving multiple postgres foreign servers, take 2

At Tue, 20 Oct 2020 04:23:12 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

Furthermore, FDW is not cancellable in general. So, I don't see a point in

trying hard to make only commit be cancelable.

I think that it is quite important that operators can cancel any
process that has been stuck for a long time. Furthermore, postgres_fdw
is more likely to be stuck since network is involved so the usefulness
of that feature would be higher.

But lower than practical performance during normal operation.

BTW, speaking of network, how can postgres_fdw respond quickly to cancel request when libpq is waiting for a reply from a down foreign server? Can the user continue to use that session after cancellation?

It seems to respond to a statement-cancel signal immediately while
waiting for a coming byte. However, seems to wait forever while
waiting a space in send-buffer. (Is that mean the session will be
stuck if it sends a large chunk of bytes while the network is down?)

After receiving a signal, it closes the problem connection. So the
local session is usable after that but the fiailed remote sessions are
closed and created another one at the next use.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#193tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#191)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote in

I think it doesn't matter whether in FDW framework or not. The user
normally doesn't care which backend processes connecting to foreign
servers. They will attempt to cancel the query like always if they
realized that a backend gets stuck. There are surely plenty of users
who use query cancellation.

The most serious impact from inability of canceling a query on a
certain session is that server-restart is required to end such a
session.

OK, as I may be repeating, I didn't deny the need for cancellation. Let''s organize the argument.

* FDW in general
My understanding is that the FDW feature does not stipulate anything about cancellation. In fact, odbc_fdw was uncancelable. What do we do about this?

* postgres_fdw
Fortunately, it is (should be?) cancelable whatever method we choose for 2PC. So no problem.
But is it really cancellable now? What if the libpq call is waiting for response when the foreign server or network is down?

"Inability to cancel requires database server restart" feels a bit exaggerating, as libpq has tcp_keepalive* and tcp_user_timeout connection parameters, and even without setting them, TCP timeout works.

Regards
Takayuki Tsunakawa

#194tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#192)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

It seems to respond to a statement-cancel signal immediately while
waiting for a coming byte. However, seems to wait forever while
waiting a space in send-buffer. (Is that mean the session will be
stuck if it sends a large chunk of bytes while the network is down?)

What part makes you worried about that? libpq's send processing?

I've just examined pgfdw_cancel_query(), too. As below, it uses a hidden 30 second timeout. After all, postgres_fdw also relies on timeout already.

/*
* If it takes too long to cancel the query and discard the result, assume
* the connection is dead.
*/
endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), 30000);

After receiving a signal, it closes the problem connection. So the
local session is usable after that but the fiailed remote sessions are
closed and created another one at the next use.

I couldn't see that the problematic connection is closed when the cancellation fails... Am I looking at a wrong place?

/*
* If connection is already unsalvageable, don't touch it
* further.
*/
if (entry->changing_xact_state)
break;

Regards
Takayuki Tsunakawa

#195Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#193)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 20 Oct 2020 at 16:54, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote in

I think it doesn't matter whether in FDW framework or not. The user
normally doesn't care which backend processes connecting to foreign
servers. They will attempt to cancel the query like always if they
realized that a backend gets stuck. There are surely plenty of users
who use query cancellation.

The most serious impact from inability of canceling a query on a
certain session is that server-restart is required to end such a
session.

OK, as I may be repeating, I didn't deny the need for cancellation.

So what's your opinion?

Let''s organize the argument.

* FDW in general
My understanding is that the FDW feature does not stipulate anything about cancellation. In fact, odbc_fdw was uncancelable. What do we do about this?

* postgres_fdw
Fortunately, it is (should be?) cancelable whatever method we choose for 2PC. So no problem.
But is it really cancellable now? What if the libpq call is waiting for response when the foreign server or network is down?

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw. Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#196Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#194)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, 20 Oct 2020 at 17:56, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

It seems to respond to a statement-cancel signal immediately while
waiting for a coming byte. However, seems to wait forever while
waiting a space in send-buffer. (Is that mean the session will be
stuck if it sends a large chunk of bytes while the network is down?)

What part makes you worried about that? libpq's send processing?

I've just examined pgfdw_cancel_query(), too. As below, it uses a hidden 30 second timeout. After all, postgres_fdw also relies on timeout already.

It uses the timeout but it's also cancellable before the timeout. See
we call CHECK_FOR_INTERRUPTS() in pgfdw_get_cleanup_result().

After receiving a signal, it closes the problem connection. So the
local session is usable after that but the fiailed remote sessions are
closed and created another one at the next use.

I couldn't see that the problematic connection is closed when the cancellation fails... Am I looking at a wrong place?

/*
* If connection is already unsalvageable, don't touch it
* further.
*/
if (entry->changing_xact_state)
break;

I guess Horiguchi-san refereed the following code in pgfdw_xact_callback():

/*
* If the connection isn't in a good idle state, discard it to
* recover. Next GetConnection will open a new connection.
*/
if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state)
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#197Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#196)
Re: Transactions involving multiple postgres foreign servers, take 2

At Tue, 20 Oct 2020 21:22:31 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in

On Tue, 20 Oct 2020 at 17:56, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

It seems to respond to a statement-cancel signal immediately while
waiting for a coming byte. However, seems to wait forever while
waiting a space in send-buffer. (Is that mean the session will be
stuck if it sends a large chunk of bytes while the network is down?)

What part makes you worried about that? libpq's send processing?

I've just examined pgfdw_cancel_query(), too. As below, it uses a hidden 30 second timeout. After all, postgres_fdw also relies on timeout already.

It uses the timeout but it's also cancellable before the timeout. See
we call CHECK_FOR_INTERRUPTS() in pgfdw_get_cleanup_result().

Yes. And as Sawada-san mentioned it's not a matter if a specific FDW
module accepts cancellation or not. It's sufficient that we have one
example. Other FDWs will follow postgres_fdw if needed.

After receiving a signal, it closes the problem connection. So the
local session is usable after that but the fiailed remote sessions are
closed and created another one at the next use.

I couldn't see that the problematic connection is closed when the cancellation fails... Am I looking at a wrong place?

...

I guess Horiguchi-san refereed the following code in pgfdw_xact_callback():

/*
* If the connection isn't in a good idle state, discard it to
* recover. Next GetConnection will open a new connection.
*/
if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state)
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

Right. Although it's not directly relevant to this discussion,
precisely, that part is not visited just after the remote "COMMIT
TRANSACTION" failed. If that commit fails or is canceled, an exception
is raised while entry->changing_xact_state = true. Then the function
is called again within AbortCurrentTransaction() and reaches the above
code.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#198tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#197)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state)
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

Right. Although it's not directly relevant to this discussion,
precisely, that part is not visited just after the remote "COMMIT
TRANSACTION" failed. If that commit fails or is canceled, an exception
is raised while entry->changing_xact_state = true. Then the function
is called again within AbortCurrentTransaction() and reaches the above
code.

Ah, then the connection to the foreign server is closed after failing to cancel the query. Thanks.

Regards
Takayuki Tsunakawa

#199tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#195)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So what's your opinion?

My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others may have pointed out something else too, but I don't remember), before going deeper into the code review.

* FDW interface
New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validate the FDW interface.
What FDW function would call what XA function(s)? What should be the arguments for the FEW functions?

* Performance
Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be the first release quality. I proposed the idea.
(If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want to keep Postgres's reputation.)
As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simply imagining what is typically written in database textbooks and research papers. I'm asking this because I saw some discussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL records other than prepare and commit unlike textbook implementations.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

* Query cancellation
As you showed, there's no problem with postgres_fdw?
The cancelability of FDW in general remains a problem, but that can be a separate undertaking.

* Global visibility
This is what Amit-san suggested some times -- "design it before reviewing the current patch." I'm a bit optimistic about this and think this FDW 2PC can be implemented separately as a pure enhancement of FDW. But I also understand his concern. If your (our?) aim is to use this FDW 2PC for sharding, we may have to design the combination of 2PC and visibility first.

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw. Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interface and its documentation so that FDW developers can implement what we consider important -- query cancellation in your discussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago.

Regards
Takayuki Tsunakawa

#200Amit Kapila
amit.kapila16@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#199)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Oct 21, 2020 at 3:03 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So what's your opinion?

* Global visibility
This is what Amit-san suggested some times -- "design it before reviewing the current patch." I'm a bit optimistic about this and think this FDW 2PC can be implemented separately as a pure enhancement of FDW. But I also understand his concern. If your (our?) aim is to use this FDW 2PC for sharding,

As far as I understand that is what the goal is for which this is a
step. For example, see the wiki [1]https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding. I understand that wiki is not the
final thing but I have seen other places as well where there is a
mention of FDW based sharding and I feel this is the reason why many
people are trying to improve this area. That is why I suggested having
an upfront design of global visibility and a deadlock detector along
with this work.

[1]: https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding

--
With Regards,
Amit Kapila.

#201Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#199)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So what's your opinion?

My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others may have pointed out something else too, but I don't remember), before going deeper into the code review.

* FDW interface
New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validate the FDW interface.
What FDW function would call what XA function(s)? What should be the arguments for the FEW functions?

I guess since FDW interfaces may be affected by the feature
architecture we can discuss later.

* Performance
Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be the first release quality. I proposed the idea.
(If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want to keep Postgres's reputation.)

What is in your mind regarding the implementation of parallel prepare
and commit? Given that some FDW plugins don't support asynchronous
execution I guess we need to use parallel workers or something. That
is, the backend process launches parallel workers to
prepare/commit/rollback foreign transactions in parallel. I don't deny
this approach but it'll definitely make the feature complex and needs
more codes.

My point is a small start and keeping simple the first version. Even
if we need one or more years for this feature, I think that
introducing the simple and minimum functionality as the first version
to the core still has benefits. We will be able to have the
opportunity to get real feedback from users and to fix bugs in the
main infrastructure before making it complex. In this sense, the patch
having the backend return without waits for resolution after the local
commit would be a good start as the first version (i.g., up to
applying v26-0006 patch). Anyway, the architecture should be
extensible enough for future improvements.

For the performance improvements, we will be able to support
asynchronous and/or prepare/commit/rollback. Moreover, having multiple
resolver processes on one database would also help get better
through-put. For the user who needs much better through-put, the user
also can select not to wait for resolution after the local commit,
like synchronous_commit = ‘local’ in replication.

As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simply imagining what is typically written in database textbooks and research papers. I'm asking this because I saw some discussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL records other than prepare and commit unlike textbook implementations.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Understood. I'll add an explanation about the message flow and disk
writes to the wiki page.

We need to consider the point of error handling during resolving
foreign transactions too.

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw. Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interface and its documentation so that FDW developers can implement what we consider important -- query cancellation in your discussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago.

I suspect the story is somewhat different. libpq fortunately supports
asynchronous execution, but when it comes to canceling the foreign
transaction resolution I think basically all FDW plugins are in the
same situation at this time. We can choose whether to make it
cancellable or not. According to the discussion so far, it completely
depends on the architecture of this feature. So my point is whether
it's worth to have this functionality for users and whether users want
it, not whether postgres_fdw is ok.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#202Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#201)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So what's your opinion?

My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others may have pointed out something else too, but I don't remember), before going deeper into the code review.

* FDW interface
New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validate the FDW interface.
What FDW function would call what XA function(s)? What should be the arguments for the FEW functions?

I guess since FDW interfaces may be affected by the feature
architecture we can discuss later.

* Performance
Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be the first release quality. I proposed the idea.
(If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want to keep Postgres's reputation.)

What is in your mind regarding the implementation of parallel prepare
and commit? Given that some FDW plugins don't support asynchronous
execution I guess we need to use parallel workers or something. That
is, the backend process launches parallel workers to
prepare/commit/rollback foreign transactions in parallel. I don't deny
this approach but it'll definitely make the feature complex and needs
more codes.

My point is a small start and keeping simple the first version. Even
if we need one or more years for this feature, I think that
introducing the simple and minimum functionality as the first version
to the core still has benefits. We will be able to have the
opportunity to get real feedback from users and to fix bugs in the
main infrastructure before making it complex. In this sense, the patch
having the backend return without waits for resolution after the local
commit would be a good start as the first version (i.g., up to
applying v26-0006 patch). Anyway, the architecture should be
extensible enough for future improvements.

For the performance improvements, we will be able to support
asynchronous and/or prepare/commit/rollback. Moreover, having multiple
resolver processes on one database would also help get better
through-put. For the user who needs much better through-put, the user
also can select not to wait for resolution after the local commit,
like synchronous_commit = ‘local’ in replication.

As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simply imagining what is typically written in database textbooks and research papers. I'm asking this because I saw some discussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL records other than prepare and commit unlike textbook implementations.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Understood. I'll add an explanation about the message flow and disk
writes to the wiki page.

Done.

We need to consider the point of error handling during resolving
foreign transactions too.

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw. Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interface and its documentation so that FDW developers can implement what we consider important -- query cancellation in your discussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago.

I suspect the story is somewhat different. libpq fortunately supports
asynchronous execution, but when it comes to canceling the foreign
transaction resolution I think basically all FDW plugins are in the
same situation at this time. We can choose whether to make it
cancellable or not. According to the discussion so far, it completely
depends on the architecture of this feature. So my point is whether
it's worth to have this functionality for users and whether users want
it, not whether postgres_fdw is ok.

I've thought again about the idea that once the backend failed to
resolve a foreign transaction it leaves to a resolver process. With
this idea, the backend process perform the 2nd phase of 2PC only once.
If an error happens during resolution it leaves to a resolver process
and returns an error to the client. We used to use this idea in the
previous patches and it’s discussed sometimes.

First of all, this idea doesn’t resolve the problem of error handling
that the transaction could return an error to the client in spite of
having been committed the local transaction. There is an argument that
this behavior could also happen even in a single server environment
but I guess the situation is slightly different. Basically what the
transaction does after the commit is cleanup. An error could happen
during cleanup but if it happens it’s likely due to a bug of
something wrong inside PostgreSQL or OS. On the other hand, during and
after resolution the transaction does major works such as connecting a
foreign server, sending an SQL, getting the result, and writing a WAL
to remove the entry. These are more likely to happen an error.

Also, with this idea, the client needs to check if the error got from
the server is really true because the local transaction might have
been committed. Although this could happen even in a single server
environment how many users check that in practice? If a server
crashes, subsequent transactions end up failing due to a network
connection error but it seems hard to distinguish between such a real
error and the fake error.

Moreover, it’s questionable in terms of extensibility. We would not
able to support keeping waiting for distributed transactions to
complete even if an error happens, like synchronous replication. The
user might want to wait in case where the failure is temporary such as
temporary network disconnection. Trying resolution only once seems to
have cons of both asynchronous and synchronous resolutions.

So I’m thinking that with this idea the user will need to change their
application so that it checks if the error they got is really true,
which is cumbersome for users. Also, it seems to me we need to
circumspectly discuss whether this idea could weaken extensibility.

Anyway, according to the discussion, it seems to me that we got a
consensus so far that the backend process prepares all foreign
transactions and a resolver process is necessary to resolve in-doubt
transaction in background. So I’ve changed the patch set as follows.
Applying these all patches, we can support asynchronous foreign
transaction resolution. That is, at transaction commit the backend
process prepares all foreign transactions, and then commit the local
transaction. After that, it returns OK of commit to the client while
leaving the prepared foreign transaction to a resolver process. A
resolver process fetches the foreign transactions to resolve and
resolves them in background. Since the 2nd phase of 2PC is performed
asynchronously a transaction that wants to see the previous
transaction result needs to check its status.

Here is brief explaination for each patches:

v27-0001-Introduce-transaction-manager-for-foreign-transa.patch

This commit adds the basic foreign transaction manager,
CommitForeignTransaction, and RollbackForeignTransaction API. These
APIs support only one-phase. With this change, FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported there is nothing the user
newly is able to do.

v27-0003-Recreate-RemoveForeignServerById.patch

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary because we need to check if there is a
foreign transaction involved with the foreign server that is about to
be removed.

v27-0004-Add-PrepareForeignTransaction-API.patch

This commit adds prepared foreign transaction support including WAL
logging and recovery, and PrepareForeignTransaction API. With this
change, the user is able to do 'PREPARE TRANSACTION’ and
'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves
foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the
local transaction. It doesn't do anything for foreign transactions.
Therefore, the user needs to resolve foreign transactions manually by
executing the pg_resolve_foreign_xacts() SQL function which is also
introduced by this commit.

v27-0005-postgres_fdw-supports-prepare-API.patch

This commit implements PrepareForeignTransaction API and makes
CommitForeignTransaction and RollbackForeignTransaction supports
two-phase commit.

v27-0006-Add-GetPrepareId-API.patch

This commit adds GetPrepareID API.

v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch

This commit introduces foreign transaction resolver and launcher
processes. With this change, the user doesn’t need to manually execute
pg_resolve_foreign_xacts() function to resolve foreign transactions
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED.
Instead, a resolver process automatically resolves them in background.

v27-0008-Prepare-foreign-transactions-at-commit-time.patch

With this commit, the transaction prepares foreign transactions marked
as modified at transaction commit if foreign_twophase_commit is
‘required’. Previously the user needs to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC
transparently to the user. But the transaction returns OK of commit to
the client after committing the local transaction and notifying the
resolver process, without waits. Foreign transactions are
asynchronously resolved by the resolver process.

v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch

With this commit, the transactions started via postgres_fdw are marked
as modified, which is necessary to use 2PC.

v27-0010-Documentation-update.patch
v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch

Documentation update and regression tests.

The missing piece from the previous version patch is synchronously
transaction resolution. In the previous patch, foreign transactions
are synchronously resolved by a resolver process. But since it's under
discussion whether this is a good approach and I'm considering
optimizing the logic it’s not included in the current patch set.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachments:

v27-0006-Add-GetPrepareId-API.patchapplication/x-patch; name=v27-0006-Add-GetPrepareId-API.patchDownload
From 4d61925f00a7dbfb2f3dc62db5c3d23f83cbf6c7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v27 06/11] Add GetPrepareId API

---
 src/backend/access/fdwxact/fdwxact.c | 54 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 3caf904370..7b3a2f1fba 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -143,6 +143,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -347,6 +348,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -414,9 +416,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -431,13 +434,48 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 89cec9aa96..91db4f5bfc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -174,6 +174,8 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -256,6 +258,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v27-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/x-patch; name=v27-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 136e94eb64d325582849f3f2c3a05a8d0b9f4e5a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v27 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 524 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1294 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index a6d2ffbf9e..106f3b2ff2 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..52e4971aed
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $wait_until) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$wait_until = 0 unless defined $wait_until;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) FROM pg_foreign_xacts",
+							$wait_until);
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..45958e8f5a
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,524 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionId();
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	TransactionId xid = GetTopTransactionId();
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 23d7d0beb2..d49a292cca 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2352,9 +2352,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2369,7 +2372,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90594bd41b..e46d3344e7 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/x-patch; name=v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From 089db35d8f84d726a7184a21cb3075ec6bd703d2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v27 09/11] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 747be681b8..eff2f2da3e 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -58,6 +58,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -289,6 +290,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->have_error = false;
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -303,6 +305,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -574,7 +590,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -583,6 +599,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e3fccc6050..1a8b6fa673 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2379,6 +2379,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3573,6 +3574,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 659222b97a..12cd55258f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v27-0005-postgres_fdw-supports-prepare-API.patchapplication/x-patch; name=v27-0005-postgres_fdw-supports-prepare-API.patchDownload
From 676d40df7c21a3403c67fa403f304aa63b4fc7be Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v27 05/11] postgres_fdw supports prepare API.

This commits also enable postgres_fdw to commit and rollback foreign transactions.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 137 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index c7da528dfb..747be681b8 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -96,6 +96,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1158,12 +1160,19 @@ void
 postgresCommitForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1209,16 +1218,24 @@ void
 postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1295,6 +1312,46 @@ postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1321,3 +1378,75 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index fefb7e6de2..a750ace025 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8974,19 +8974,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 473f94c929..e3fccc6050 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -562,6 +562,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index e3b2897495..659222b97a 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
 extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..ece57de1b1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2647,13 +2647,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v27-0010-Documentation-update.patchapplication/x-patch; name=v27-0010-Documentation-update.patchDownload
From fdfd5a077ccf369eda3de31acce5deae80d53295 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v27 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 888 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5fb9dca425..ec6a0752cc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9262,6 +9262,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11115,6 +11120,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433e31..3ae8cf6480 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9248,6 +9248,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..0fbb9c4123 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1427,6 +1427,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1906,4 +2017,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a5161bb22b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7b1dc264f6..26736698af 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26173,6 +26173,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 98e1995453..d00663dc14 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1066,6 +1066,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1295,6 +1307,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1588,6 +1612,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1905,6 +1934,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.27.0

v27-0004-Add-PrepareForeignTransaction-API.patchapplication/x-patch; name=v27-0004-Add-PrepareForeignTransaction-API.patchDownload
From 01c27f06132bb1f3c7ca03b9a222292a262cef4e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v27 04/11] Add PrepareForeignTransaction API.

The transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are not neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/fdwxact/fdwxact.c          | 1755 ++++++++++++++++-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    1 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/postmaster/postmaster.c           |    1 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   56 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    2 +
 src/test/regress/expected/rules.out           |    7 +
 35 files changed, 2164 insertions(+), 28 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c5badd9c0a..fefb7e6de2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 00da860b31..3caf904370 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -9,8 +9,59 @@
  * FDW who implements both commit and rollback APIs can request to register the
  * foreign transaction by FdwXactRegisterXact() to participate it to a
  * group of distributed tranasction.  The registered foreign transactions are
- * identified by OIDs of server and user.  On commit and rollback, the global
- * transaction manager calls corresponding FDW API to end the tranasctions.
+ * identified by OIDs of server and user.  On commit, rollback and prepare, the
+ * global transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having been
+ * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
+ * local transaction but not do anything for involved foreign transactions.  To resolve
+ * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
+ * function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.	 To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *	 status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the entry from
+ *	 being updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
  *
  * Portions Copyright (c) 2020, PostgreSQL Global Development Group
  *
@@ -20,15 +71,53 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -37,13 +126,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -52,11 +151,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -82,6 +273,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/*
@@ -142,14 +340,336 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants != NIL);
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -162,6 +682,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
 	state.flags = FDWXACT_FLAG_ONEPHASE;
 
 	if (commit)
@@ -181,14 +702,46 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nlefts = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/* Unlock the foreign transaction entry */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+		nlefts++;
+	}
+
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -211,23 +764,1203 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
+
+		/*
+		 * This foreign transaction might have been prepared.  In commit case,
+		 * we don't need to anything for this participant because all foreign
+		 * transactions should have already been prepared and therefore the
+		 * transaction already closed. These will be resolved manually.  On the
+		 * other hand in abort case, we need to close the transaction if
+		 * preparing might be in-progress, since an error might have occurred
+		 * on preparing a foreign transaction.
+		 */
+		if (!is_commit)
+		{
+			int					   status;
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * Check if the local transaction has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
  */
 void
 PrePrepare_FdwXact(void)
 {
-	/* We don't support to prepare foreign transactions */
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * We keep FdwXactParticipants until the transaction end so that we change
+	 * the involved foreign transactions to ABORTING in case of failure.
+	 */
+}
+
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+static void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		FdwXactResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare the resolution state to pass to API */
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->fdwxact_id);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->fdwxact_id)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->fdwxact_id),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("permission denied to remove prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction"))));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4b3e67eb49 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060443..a71210772f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0a8d1da4bd..e4fadcaf2c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2567,6 +2567,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a1078a7cfc..417e7595e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4602,6 +4603,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6291,6 +6293,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6838,14 +6843,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7047,7 +7053,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7559,11 +7568,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7891,8 +7902,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9198,6 +9213,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9740,6 +9756,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9759,6 +9776,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9777,6 +9795,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9984,6 +10003,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10187,6 +10207,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2e4aa1c4b6..42c64beac9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..c290b9ea94 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1409,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 6532a836e5..d34e26fd26 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f1dca2f25b..43e1c93dc5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4137,6 +4137,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 959e3b8873..81e6cb9ca2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..23ae805218 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..2d7191d3cd 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 05661e379e..868dd9544b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1705,6 +1711,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1832,6 +1839,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1889,6 +1902,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3793,6 +3809,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..dc29a7ea6f 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..951ed0ece2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2458,6 +2459,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..863e8ccc3a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ee3bfa82f4..eae52defba 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -204,6 +204,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index cb6ef19182..1712b794c3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 6c8b111ab5..9ba819e9d1 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,24 +10,112 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactRslvState
 {
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactRslvState;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..e1b09a70d2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..ed6372d2e6 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c01da4bf01..09c26b5cd8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6030,6 +6030,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4db7ade9a3..89cec9aa96 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -171,6 +171,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
 
@@ -254,6 +255,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 257e515bfe..a61a08c5d6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1004,6 +1004,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..0124c8c687 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -91,5 +91,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 097ff5d111..64da3b40d7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v27-0007-Introduce-foreign-transaction-launcher-and-resol.patchapplication/x-patch; name=v27-0007-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From ca8050a6b7490386827c25974c2c81ba8a5575ee Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v27 07/11] Introduce foreign transaction launcher and resolver
 processes.

Foreign transactions prepared by PREPARE TRANSACTION are resolved in
background by a resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/Makefile           |   5 +-
 src/backend/access/fdwxact/fdwxact.c          |  33 +-
 src/backend/access/fdwxact/launcher.c         | 567 ++++++++++++++++++
 src/backend/access/fdwxact/resolver.c         | 354 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   6 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  63 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1185 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
index aacab1d729..151e3ae336 100644
--- a/src/backend/access/fdwxact/Makefile
+++ b/src/backend/access/fdwxact/Makefile
@@ -12,6 +12,9 @@ subdir = src/backend/access/fdwxact
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fdwxact.o
+OBJS = \
+	fdwxact.o \
+	resolver.o \
+	launcher.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 7b3a2f1fba..b4cab71c3d 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -22,10 +22,10 @@
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  To resolve
- * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
- * function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * local transaction but not do anything for involved foreign transactions.  The preapred
+ * foreign transactions are resolved by a resolver process asynchronously.  Also, the
+ * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
+ * manually.
  *
  * LOCKING
  *
@@ -76,7 +76,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -157,6 +160,7 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
@@ -165,7 +169,6 @@ static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
 										 FdwXactParticipant *fdw_part);
-static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
 static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
@@ -772,12 +775,13 @@ ForgetAllFdwXactParticipants(void)
 
 	/*
 	 * If we leave any FdwXact entries, update the oldest local transaction of
-	 * unresolved distributed transaction.
+	 * unresolved distributed transaction and notify the launcher.
 	 */
 	if (nlefts > 0)
 	{
 		elog(DEBUG1, "left %u foreign transactions", nlefts);
 		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
 	}
 
 	list_free_deep(FdwXactParticipants);
@@ -785,7 +789,9 @@ ForgetAllFdwXactParticipants(void)
 }
 
 /*
- * Commit or rollback all foreign transactions.
+ * Close in-progress involved foreign transactions.  We don't perform the second
+ * phase of two-phase commit protocol here.  All prepared foreign transactions
+ * enter in-doubt state and a resolver process will process them.
  */
 void
 AtEOXact_FdwXact(bool is_commit)
@@ -889,7 +895,7 @@ PrePrepare_FdwXact(void)
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
+void
 FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
 	for (int i = 0; i < nfdwxacts; i++)
@@ -924,6 +930,17 @@ FdwXactExists(Oid dbid, Oid serverid, Oid userid)
 
 	return (idx >= 0);
 }
+bool
+FdwXactExistsXid(TransactionId xid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(InvalidOid, xid, InvalidOid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
 
 /*
  * Return the index of first found FdwXact entry that matched to given arguments.
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..916b9af2f7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..e37931e405
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,354 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	fdwxact_resolver_detach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			Assert(fdwxact->dbid == waiter->databaseId);
+
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index a71210772f..1348a283b1 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2286,6 +2288,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2345,6 +2354,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..b2384f9ab9 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 43e1c93dc5..f2216ca60c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3809,6 +3809,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 81e6cb9ca2..e8f579699f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #endif
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -925,6 +926,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -990,12 +994,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d7191d3cd..271fd35884 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index dc29a7ea6f..9327394013 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 411cfadbff..496e2b3a4a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 951ed0ece2..a615140d1a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -760,6 +760,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2469,6 +2473,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 863e8ccc3a..2ed09cb347 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -733,6 +733,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 9ba819e9d1..a3763e52c0 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -104,13 +104,19 @@ typedef struct FdwXactRslvState
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern bool FdwXactExistsXid(TransactionId xid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
 								Oid userid, void *content, int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 09c26b5cd8..49efb63e6a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6167,6 +6167,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a61a08c5d6..0967c09f3c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -877,6 +877,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 04431d0eb2..a00ca73355 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v27-0008-Prepare-foreign-transactions-at-commit-time.patchapplication/x-patch; name=v27-0008-Prepare-foreign-transactions-at-commit-time.patchDownload
From 810ef993b28b1b461223f53965600842762bf496 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 3 Nov 2020 21:58:48 +0900
Subject: [PATCH v27 08/11] Prepare foreign transactions at commit time

When foreign_twophase_commit is 'required', the transaction involving
multiple foreign servers prepares all transactions initiated on the
foreign servers are prepared at pre-commit phase.  Prepared foreign
transactions are resolved asynchronously by a resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c          | 191 +++++++++++++++++-
 src/backend/access/transam/xact.c             |   7 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  10 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 229 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index b4cab71c3d..79bd7596a3 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -19,13 +19,27 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  The preapred
- * foreign transactions are resolved by a resolver process asynchronously.  Also, the
- * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
- * manually.
+ * local transaction but not do anything for involved foreign transactions.
  *
  * LOCKING
  *
@@ -92,8 +106,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -105,6 +121,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /* Directory where the foreign prepared transaction files will reside */
 #define FDWXACTS_DIR "pg_fdwxact"
 
@@ -142,6 +162,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -152,18 +175,24 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
@@ -182,6 +211,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -258,7 +288,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -273,6 +303,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -302,6 +333,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -348,6 +380,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -356,11 +389,139 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	return fdw_part;
 }
 
+ /*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+}
+
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	ListCell   *lc;
 
@@ -378,6 +539,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
 		Assert(fdw_part->fdwxact_id);
@@ -755,7 +919,10 @@ ForgetAllFdwXactParticipants(void)
 	int			nlefts = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -812,7 +979,10 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (!fdwxact)
 		{
-			/* Commit or rollback the foreign transaction in one-phase */
+			/*
+			 * If this participant doesn't have an FdwXact entry, it's not
+			 * prepared yet. Therefore we can commit or rollback it in one-phase.
+			 */
 			Assert(ServerSupportTransactionCallback(fdw_part));
 			FdwXactParticipantEndTransaction(fdw_part, is_commit);
 			continue;
@@ -842,6 +1012,7 @@ AtEOXact_FdwXact(bool is_commit)
 	}
 
 	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -881,7 +1052,7 @@ PrePrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * We keep FdwXactParticipants until the transaction end so that we change
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e4fadcaf2c..5cf2dea70f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2115,6 +2119,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a615140d1a..bb9bb1056f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -499,6 +499,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4657,6 +4675,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2ed09cb347..5a73443be1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -744,6 +744,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index a3763e52c0..6bf4f5dd7d 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -20,6 +20,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -107,10 +115,12 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
+extern void PreCommit_FdwXact(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
 extern bool FdwXactIsForeignTwophaseCommitRequired(void);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 91db4f5bfc..7a444d0590 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -273,7 +273,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/x-patch; name=v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 26c9ce7b1ad62dc206646b5091eec0bdddbc3bc1 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v27 02/11] postgres_fdw supports commit and rollback APIs.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 471 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 241 insertions(+), 239 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 2f411cf2f7..c7da528dfb 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,6 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -80,8 +81,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -94,6 +94,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -108,56 +110,11 @@ static bool UserMappingPasswordRequired(UserMapping *user);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		MemSet(&ctl, 0, sizeof(ctl));
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		/* allocate ConnectionHash in the cache context */
-		ctl.hcxt = CacheMemoryContext;
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -189,7 +146,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -250,7 +207,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -259,6 +216,60 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		MemSet(&ctl, 0, sizeof(ctl));
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		/* allocate ConnectionHash in the cache context */
+		ctl.hcxt = CacheMemoryContext;
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Set flag that we did GetConnection during the current transaction */
+	xact_got_connection = true;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -548,7 +559,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -560,6 +571,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -775,193 +789,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1326,3 +1153,171 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..c5badd9c0a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..473f94c929 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -559,6 +559,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..e3b2897495 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v27-0003-Recreate-RemoveForeignServerById.patchapplication/x-patch; name=v27-0003-Recreate-RemoveForeignServerById.patchDownload
From b919060f56dbdb36e639613f24111faa99e548f6 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v27 03/11] Recreate RemoveForeignServerById()

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b0d037600e..5748e4277c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1555,6 +1555,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1569,7 +1573,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 7a079ef07f..737a14a22a 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.27.0

v27-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/x-patch; name=v27-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From d83b4ebac56b88303e2a856db60549b069c61772 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v27 01/11] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit adds both CommitForeignTransaction and
RollbackForeignTransaction FDW APIs. FDW that implements these APIs
can be managed by the global transaciton manager.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile          |   4 +-
 src/backend/access/fdwxact/Makefile  |  17 ++
 src/backend/access/fdwxact/fdwxact.c | 233 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  10 ++
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  33 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 7 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..2372a1a690 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,7 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+SUBDIRS	    = brin common fdwxact gin gist hash heap index nbtree rmgrdesc \
+			  spgist table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..aacab1d729
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..00da860b31
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,233 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * FDW who implements both commit and rollback APIs can request to register the
+ * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscaches. Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Check if the local transaction has any foreign transaction.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	/* We don't support to prepare foreign transactions */
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..0a8d1da4bd 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2229,6 +2230,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
 					  : XACT_EVENT_COMMIT);
 
+	/* Commit foreign transaction if any */
+	AtEOXact_FdwXact(true);
+
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_BEFORE_LOCKS,
 						 true, true);
@@ -2368,6 +2372,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	PrePrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2755,6 +2762,9 @@ AbortTransaction(void)
 		else
 			CallXactCallbacks(XACT_EVENT_ABORT);
 
+		/* Rollback foreign transactions if any */
+		AtEOXact_FdwXact(false);
+
 		ResourceOwnerRelease(TopTransactionResourceOwner,
 							 RESOURCE_RELEASE_BEFORE_LOCKS,
 							 false, true);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..6532a836e5 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support either both APIs or neither */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..6c8b111ab5
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,33 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void PrePrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..4db7ade9a3 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -170,6 +171,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -246,6 +250,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -259,4 +267,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#203Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#202)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Nov 5, 2020 at 12:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So what's your opinion?

My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others may have pointed out something else too, but I don't remember), before going deeper into the code review.

* FDW interface
New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validate the FDW interface.
What FDW function would call what XA function(s)? What should be the arguments for the FEW functions?

I guess since FDW interfaces may be affected by the feature
architecture we can discuss later.

* Performance
Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be the first release quality. I proposed the idea.
(If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want to keep Postgres's reputation.)

What is in your mind regarding the implementation of parallel prepare
and commit? Given that some FDW plugins don't support asynchronous
execution I guess we need to use parallel workers or something. That
is, the backend process launches parallel workers to
prepare/commit/rollback foreign transactions in parallel. I don't deny
this approach but it'll definitely make the feature complex and needs
more codes.

My point is a small start and keeping simple the first version. Even
if we need one or more years for this feature, I think that
introducing the simple and minimum functionality as the first version
to the core still has benefits. We will be able to have the
opportunity to get real feedback from users and to fix bugs in the
main infrastructure before making it complex. In this sense, the patch
having the backend return without waits for resolution after the local
commit would be a good start as the first version (i.g., up to
applying v26-0006 patch). Anyway, the architecture should be
extensible enough for future improvements.

For the performance improvements, we will be able to support
asynchronous and/or prepare/commit/rollback. Moreover, having multiple
resolver processes on one database would also help get better
through-put. For the user who needs much better through-put, the user
also can select not to wait for resolution after the local commit,
like synchronous_commit = ‘local’ in replication.

As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simply imagining what is typically written in database textbooks and research papers. I'm asking this because I saw some discussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL records other than prepare and commit unlike textbook implementations.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Understood. I'll add an explanation about the message flow and disk
writes to the wiki page.

Done.

We need to consider the point of error handling during resolving
foreign transactions too.

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw. Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interface and its documentation so that FDW developers can implement what we consider important -- query cancellation in your discussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago.

I suspect the story is somewhat different. libpq fortunately supports
asynchronous execution, but when it comes to canceling the foreign
transaction resolution I think basically all FDW plugins are in the
same situation at this time. We can choose whether to make it
cancellable or not. According to the discussion so far, it completely
depends on the architecture of this feature. So my point is whether
it's worth to have this functionality for users and whether users want
it, not whether postgres_fdw is ok.

I've thought again about the idea that once the backend failed to
resolve a foreign transaction it leaves to a resolver process. With
this idea, the backend process perform the 2nd phase of 2PC only once.
If an error happens during resolution it leaves to a resolver process
and returns an error to the client. We used to use this idea in the
previous patches and it’s discussed sometimes.

First of all, this idea doesn’t resolve the problem of error handling
that the transaction could return an error to the client in spite of
having been committed the local transaction. There is an argument that
this behavior could also happen even in a single server environment
but I guess the situation is slightly different. Basically what the
transaction does after the commit is cleanup. An error could happen
during cleanup but if it happens it’s likely due to a bug of
something wrong inside PostgreSQL or OS. On the other hand, during and
after resolution the transaction does major works such as connecting a
foreign server, sending an SQL, getting the result, and writing a WAL
to remove the entry. These are more likely to happen an error.

Also, with this idea, the client needs to check if the error got from
the server is really true because the local transaction might have
been committed. Although this could happen even in a single server
environment how many users check that in practice? If a server
crashes, subsequent transactions end up failing due to a network
connection error but it seems hard to distinguish between such a real
error and the fake error.

Moreover, it’s questionable in terms of extensibility. We would not
able to support keeping waiting for distributed transactions to
complete even if an error happens, like synchronous replication. The
user might want to wait in case where the failure is temporary such as
temporary network disconnection. Trying resolution only once seems to
have cons of both asynchronous and synchronous resolutions.

So I’m thinking that with this idea the user will need to change their
application so that it checks if the error they got is really true,
which is cumbersome for users. Also, it seems to me we need to
circumspectly discuss whether this idea could weaken extensibility.

Anyway, according to the discussion, it seems to me that we got a
consensus so far that the backend process prepares all foreign
transactions and a resolver process is necessary to resolve in-doubt
transaction in background. So I’ve changed the patch set as follows.
Applying these all patches, we can support asynchronous foreign
transaction resolution. That is, at transaction commit the backend
process prepares all foreign transactions, and then commit the local
transaction. After that, it returns OK of commit to the client while
leaving the prepared foreign transaction to a resolver process. A
resolver process fetches the foreign transactions to resolve and
resolves them in background. Since the 2nd phase of 2PC is performed
asynchronously a transaction that wants to see the previous
transaction result needs to check its status.

Here is brief explaination for each patches:

v27-0001-Introduce-transaction-manager-for-foreign-transa.patch

This commit adds the basic foreign transaction manager,
CommitForeignTransaction, and RollbackForeignTransaction API. These
APIs support only one-phase. With this change, FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported there is nothing the user
newly is able to do.

v27-0003-Recreate-RemoveForeignServerById.patch

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary because we need to check if there is a
foreign transaction involved with the foreign server that is about to
be removed.

v27-0004-Add-PrepareForeignTransaction-API.patch

This commit adds prepared foreign transaction support including WAL
logging and recovery, and PrepareForeignTransaction API. With this
change, the user is able to do 'PREPARE TRANSACTION’ and
'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves
foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the
local transaction. It doesn't do anything for foreign transactions.
Therefore, the user needs to resolve foreign transactions manually by
executing the pg_resolve_foreign_xacts() SQL function which is also
introduced by this commit.

v27-0005-postgres_fdw-supports-prepare-API.patch

This commit implements PrepareForeignTransaction API and makes
CommitForeignTransaction and RollbackForeignTransaction supports
two-phase commit.

v27-0006-Add-GetPrepareId-API.patch

This commit adds GetPrepareID API.

v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch

This commit introduces foreign transaction resolver and launcher
processes. With this change, the user doesn’t need to manually execute
pg_resolve_foreign_xacts() function to resolve foreign transactions
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED.
Instead, a resolver process automatically resolves them in background.

v27-0008-Prepare-foreign-transactions-at-commit-time.patch

With this commit, the transaction prepares foreign transactions marked
as modified at transaction commit if foreign_twophase_commit is
‘required’. Previously the user needs to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC
transparently to the user. But the transaction returns OK of commit to
the client after committing the local transaction and notifying the
resolver process, without waits. Foreign transactions are
asynchronously resolved by the resolver process.

v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch

With this commit, the transactions started via postgres_fdw are marked
as modified, which is necessary to use 2PC.

v27-0010-Documentation-update.patch
v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch

Documentation update and regression tests.

The missing piece from the previous version patch is synchronously
transaction resolution. In the previous patch, foreign transactions
are synchronously resolved by a resolver process. But since it's under
discussion whether this is a good approach and I'm considering
optimizing the logic it’s not included in the current patch set.

Cfbot reported an error. I've attached the updated version patch set
to make cfbot happy.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachments:

v28-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v28-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From e20f3f869e1e52c277823fbab05d32ae01b05e3f Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v28 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 524 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1294 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index a6d2ffbf9e..106f3b2ff2 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..52e4971aed
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $wait_until) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$wait_until = 0 unless defined $wait_until;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) FROM pg_foreign_xacts",
+							$wait_until);
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..8e2a57b052
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,524 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 23d7d0beb2..d49a292cca 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2352,9 +2352,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2369,7 +2372,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90594bd41b..e46d3344e7 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v28-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/octet-stream; name=v28-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From be1ecda1207d277966ee406017d9061bb2d139dd Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v28 09/11] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 747be681b8..eff2f2da3e 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -58,6 +58,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -289,6 +290,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->have_error = false;
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -303,6 +305,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -574,7 +590,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -583,6 +599,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e3fccc6050..1a8b6fa673 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2379,6 +2379,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3573,6 +3574,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 659222b97a..12cd55258f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v28-0010-Documentation-update.patchapplication/octet-stream; name=v28-0010-Documentation-update.patchDownload
From 8867783ba6e92b61f7ebc105ca510b629c882515 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v28 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 888 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5fb9dca425..ec6a0752cc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9262,6 +9262,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11115,6 +11120,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433e31..3ae8cf6480 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9248,6 +9248,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..0fbb9c4123 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1427,6 +1427,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1906,4 +2017,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a5161bb22b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7b1dc264f6..26736698af 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26173,6 +26173,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 98e1995453..d00663dc14 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1066,6 +1066,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1295,6 +1307,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1588,6 +1612,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1905,6 +1934,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.27.0

v28-0006-Add-GetPrepareId-API.patchapplication/octet-stream; name=v28-0006-Add-GetPrepareId-API.patchDownload
From 3083dc26003953613c28b6fdaa640c83d7af73f7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v28 06/11] Add GetPrepareId API

---
 src/backend/access/fdwxact/fdwxact.c | 54 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 3caf904370..7b3a2f1fba 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -143,6 +143,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -347,6 +348,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -414,9 +416,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -431,13 +434,48 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 89cec9aa96..91db4f5bfc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -174,6 +174,8 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -256,6 +258,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v28-0008-Prepare-foreign-transactions-at-commit-time.patchapplication/octet-stream; name=v28-0008-Prepare-foreign-transactions-at-commit-time.patchDownload
From 544c321386af2253fdf3f24b23f50a6bb0fc8b1c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 3 Nov 2020 21:58:48 +0900
Subject: [PATCH v28 08/11] Prepare foreign transactions at commit time

When foreign_twophase_commit is 'required', the transaction involving
multiple foreign servers prepares all transactions initiated on the
foreign servers are prepared at pre-commit phase.  Prepared foreign
transactions are resolved asynchronously by a resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c          | 191 +++++++++++++++++-
 src/backend/access/transam/xact.c             |   7 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  10 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 229 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index b4cab71c3d..79bd7596a3 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -19,13 +19,27 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  The preapred
- * foreign transactions are resolved by a resolver process asynchronously.  Also, the
- * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
- * manually.
+ * local transaction but not do anything for involved foreign transactions.
  *
  * LOCKING
  *
@@ -92,8 +106,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -105,6 +121,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /* Directory where the foreign prepared transaction files will reside */
 #define FDWXACTS_DIR "pg_fdwxact"
 
@@ -142,6 +162,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -152,18 +175,24 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
@@ -182,6 +211,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -258,7 +288,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -273,6 +303,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -302,6 +333,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -348,6 +380,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -356,11 +389,139 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	return fdw_part;
 }
 
+ /*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+}
+
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	ListCell   *lc;
 
@@ -378,6 +539,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
 		Assert(fdw_part->fdwxact_id);
@@ -755,7 +919,10 @@ ForgetAllFdwXactParticipants(void)
 	int			nlefts = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -812,7 +979,10 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (!fdwxact)
 		{
-			/* Commit or rollback the foreign transaction in one-phase */
+			/*
+			 * If this participant doesn't have an FdwXact entry, it's not
+			 * prepared yet. Therefore we can commit or rollback it in one-phase.
+			 */
 			Assert(ServerSupportTransactionCallback(fdw_part));
 			FdwXactParticipantEndTransaction(fdw_part, is_commit);
 			continue;
@@ -842,6 +1012,7 @@ AtEOXact_FdwXact(bool is_commit)
 	}
 
 	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -881,7 +1052,7 @@ PrePrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * We keep FdwXactParticipants until the transaction end so that we change
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e4fadcaf2c..5cf2dea70f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2115,6 +2119,9 @@ CommitTransaction(void)
 			break;
 	}
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 942f6b6a43..771733862a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -499,6 +499,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4657,6 +4675,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2ed09cb347..5a73443be1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -744,6 +744,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index a3763e52c0..6bf4f5dd7d 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -20,6 +20,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -107,10 +115,12 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
+extern void PreCommit_FdwXact(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
 extern bool FdwXactIsForeignTwophaseCommitRequired(void);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 91db4f5bfc..7a444d0590 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -273,7 +273,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v28-0007-Introduce-foreign-transaction-launcher-and-resol.patchapplication/octet-stream; name=v28-0007-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From 5812ebda472ce8b4ce8e8e116e2e761c8a2842cf Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v28 07/11] Introduce foreign transaction launcher and resolver
 processes.

Foreign transactions prepared by PREPARE TRANSACTION are resolved in
background by a resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/Makefile           |   5 +-
 src/backend/access/fdwxact/fdwxact.c          |  33 +-
 src/backend/access/fdwxact/launcher.c         | 567 ++++++++++++++++++
 src/backend/access/fdwxact/resolver.c         | 352 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   6 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  63 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1183 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
index aacab1d729..151e3ae336 100644
--- a/src/backend/access/fdwxact/Makefile
+++ b/src/backend/access/fdwxact/Makefile
@@ -12,6 +12,9 @@ subdir = src/backend/access/fdwxact
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fdwxact.o
+OBJS = \
+	fdwxact.o \
+	resolver.o \
+	launcher.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 7b3a2f1fba..b4cab71c3d 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -22,10 +22,10 @@
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  To resolve
- * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
- * function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * local transaction but not do anything for involved foreign transactions.  The preapred
+ * foreign transactions are resolved by a resolver process asynchronously.  Also, the
+ * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
+ * manually.
  *
  * LOCKING
  *
@@ -76,7 +76,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -157,6 +160,7 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
@@ -165,7 +169,6 @@ static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
 										 FdwXactParticipant *fdw_part);
-static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
 static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
@@ -772,12 +775,13 @@ ForgetAllFdwXactParticipants(void)
 
 	/*
 	 * If we leave any FdwXact entries, update the oldest local transaction of
-	 * unresolved distributed transaction.
+	 * unresolved distributed transaction and notify the launcher.
 	 */
 	if (nlefts > 0)
 	{
 		elog(DEBUG1, "left %u foreign transactions", nlefts);
 		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
 	}
 
 	list_free_deep(FdwXactParticipants);
@@ -785,7 +789,9 @@ ForgetAllFdwXactParticipants(void)
 }
 
 /*
- * Commit or rollback all foreign transactions.
+ * Close in-progress involved foreign transactions.  We don't perform the second
+ * phase of two-phase commit protocol here.  All prepared foreign transactions
+ * enter in-doubt state and a resolver process will process them.
  */
 void
 AtEOXact_FdwXact(bool is_commit)
@@ -889,7 +895,7 @@ PrePrepare_FdwXact(void)
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
+void
 FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
 	for (int i = 0; i < nfdwxacts; i++)
@@ -924,6 +930,17 @@ FdwXactExists(Oid dbid, Oid serverid, Oid userid)
 
 	return (idx >= 0);
 }
+bool
+FdwXactExistsXid(TransactionId xid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(InvalidOid, xid, InvalidOid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
 
 /*
  * Return the index of first found FdwXact entry that matched to given arguments.
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..916b9af2f7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..c9d41428fc
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	fdwxact_resolver_detach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index a71210772f..1348a283b1 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2286,6 +2288,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2345,6 +2354,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..b2384f9ab9 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b05d7d2ff..13830cc51e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3809,6 +3809,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 81e6cb9ca2..e8f579699f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #endif
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -925,6 +926,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -990,12 +994,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d7191d3cd..271fd35884 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index dc29a7ea6f..9327394013 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 411cfadbff..496e2b3a4a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ec6cef8ad7..942f6b6a43 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -760,6 +760,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2469,6 +2473,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 863e8ccc3a..2ed09cb347 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -733,6 +733,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 9ba819e9d1..a3763e52c0 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -104,13 +104,19 @@ typedef struct FdwXactRslvState
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern bool FdwXactExistsXid(TransactionId xid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
 								Oid userid, void *content, int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 09c26b5cd8..49efb63e6a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6167,6 +6167,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a61a08c5d6..0967c09f3c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -877,6 +877,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 04431d0eb2..a00ca73355 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v28-0003-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v28-0003-Recreate-RemoveForeignServerById.patchDownload
From 14bc69126fe0945558c100288fc8e1ce5cfb090b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v28 03/11] Recreate RemoveForeignServerById()

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b0d037600e..5748e4277c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1555,6 +1555,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1569,7 +1573,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 7a079ef07f..737a14a22a 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.27.0

v28-0005-postgres_fdw-supports-prepare-API.patchapplication/octet-stream; name=v28-0005-postgres_fdw-supports-prepare-API.patchDownload
From 3fcb584b02655e56dba3a75d8db8ea5bfbe8b450 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v28 05/11] postgres_fdw supports prepare API.

This commits also enable postgres_fdw to commit and rollback foreign transactions.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 137 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index c7da528dfb..747be681b8 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -96,6 +96,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1158,12 +1160,19 @@ void
 postgresCommitForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1209,16 +1218,24 @@ void
 postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1295,6 +1312,46 @@ postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1321,3 +1378,75 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index fefb7e6de2..a750ace025 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8974,19 +8974,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 473f94c929..e3fccc6050 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -562,6 +562,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index e3b2897495..659222b97a 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
 extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..ece57de1b1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2647,13 +2647,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v28-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v28-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 170751b815a988ed097b644765ba34519b8a68cc Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v28 02/11] postgres_fdw supports commit and rollback APIs.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 471 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 241 insertions(+), 239 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 2f411cf2f7..c7da528dfb 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,6 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -80,8 +81,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -94,6 +94,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -108,56 +110,11 @@ static bool UserMappingPasswordRequired(UserMapping *user);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		MemSet(&ctl, 0, sizeof(ctl));
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		/* allocate ConnectionHash in the cache context */
-		ctl.hcxt = CacheMemoryContext;
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -189,7 +146,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -250,7 +207,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -259,6 +216,60 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		MemSet(&ctl, 0, sizeof(ctl));
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		/* allocate ConnectionHash in the cache context */
+		ctl.hcxt = CacheMemoryContext;
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Set flag that we did GetConnection during the current transaction */
+	xact_got_connection = true;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -548,7 +559,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -560,6 +571,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -775,193 +789,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1326,3 +1153,171 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..c5badd9c0a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..473f94c929 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -559,6 +559,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..e3b2897495 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v28-0004-Add-PrepareForeignTransaction-API.patchapplication/octet-stream; name=v28-0004-Add-PrepareForeignTransaction-API.patchDownload
From 0b1b4b17caa173a8625bb4b689292540763eade0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v28 04/11] Add PrepareForeignTransaction API.

The transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are not neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/fdwxact/fdwxact.c          | 1755 ++++++++++++++++-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    1 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/postmaster/postmaster.c           |    1 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   56 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    2 +
 src/test/regress/expected/rules.out           |    7 +
 35 files changed, 2164 insertions(+), 28 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c5badd9c0a..fefb7e6de2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 00da860b31..3caf904370 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -9,8 +9,59 @@
  * FDW who implements both commit and rollback APIs can request to register the
  * foreign transaction by FdwXactRegisterXact() to participate it to a
  * group of distributed tranasction.  The registered foreign transactions are
- * identified by OIDs of server and user.  On commit and rollback, the global
- * transaction manager calls corresponding FDW API to end the tranasctions.
+ * identified by OIDs of server and user.  On commit, rollback and prepare, the
+ * global transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having been
+ * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
+ * local transaction but not do anything for involved foreign transactions.  To resolve
+ * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
+ * function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.	 To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *	 status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the entry from
+ *	 being updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
  *
  * Portions Copyright (c) 2020, PostgreSQL Global Development Group
  *
@@ -20,15 +71,53 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -37,13 +126,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -52,11 +151,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -82,6 +273,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/*
@@ -142,14 +340,336 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants != NIL);
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -162,6 +682,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
 	state.flags = FDWXACT_FLAG_ONEPHASE;
 
 	if (commit)
@@ -181,14 +702,46 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nlefts = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/* Unlock the foreign transaction entry */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+		nlefts++;
+	}
+
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -211,23 +764,1203 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
+
+		/*
+		 * This foreign transaction might have been prepared.  In commit case,
+		 * we don't need to anything for this participant because all foreign
+		 * transactions should have already been prepared and therefore the
+		 * transaction already closed. These will be resolved manually.  On the
+		 * other hand in abort case, we need to close the transaction if
+		 * preparing might be in-progress, since an error might have occurred
+		 * on preparing a foreign transaction.
+		 */
+		if (!is_commit)
+		{
+			int					   status;
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * Check if the local transaction has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
  */
 void
 PrePrepare_FdwXact(void)
 {
-	/* We don't support to prepare foreign transactions */
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * We keep FdwXactParticipants until the transaction end so that we change
+	 * the involved foreign transactions to ABORTING in case of failure.
+	 */
+}
+
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+static void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		FdwXactResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare the resolution state to pass to API */
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->fdwxact_id);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->fdwxact_id)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->fdwxact_id),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("permission denied to remove prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction"))));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4b3e67eb49 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060443..a71210772f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0a8d1da4bd..e4fadcaf2c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2567,6 +2567,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a1078a7cfc..417e7595e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4602,6 +4603,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6291,6 +6293,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6838,14 +6843,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7047,7 +7053,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7559,11 +7568,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7891,8 +7902,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9198,6 +9213,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9740,6 +9756,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9759,6 +9776,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9777,6 +9795,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9984,6 +10003,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10187,6 +10207,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2e4aa1c4b6..42c64beac9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..c290b9ea94 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1409,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 6532a836e5..d34e26fd26 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e76e627c6b..4b05d7d2ff 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4137,6 +4137,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 959e3b8873..81e6cb9ca2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..23ae805218 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..2d7191d3cd 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 05661e379e..868dd9544b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1705,6 +1711,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1832,6 +1839,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1889,6 +1902,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3793,6 +3809,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..dc29a7ea6f 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..ec6cef8ad7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2458,6 +2459,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..863e8ccc3a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ee3bfa82f4..eae52defba 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -204,6 +204,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index cb6ef19182..1712b794c3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 6c8b111ab5..9ba819e9d1 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,24 +10,112 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactRslvState
 {
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactRslvState;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..e1b09a70d2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..ed6372d2e6 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c01da4bf01..09c26b5cd8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6030,6 +6030,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4db7ade9a3..89cec9aa96 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -171,6 +171,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
 
@@ -254,6 +255,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 257e515bfe..a61a08c5d6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1004,6 +1004,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..0124c8c687 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -91,5 +91,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 097ff5d111..64da3b40d7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v28-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v28-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From 7e0af8e0e5be41971f4d79b4838e0d0206a60528 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v28 01/11] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit adds both CommitForeignTransaction and
RollbackForeignTransaction FDW APIs. FDW that implements these APIs
can be managed by the global transaciton manager.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile          |   4 +-
 src/backend/access/fdwxact/Makefile  |  17 ++
 src/backend/access/fdwxact/fdwxact.c | 233 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  10 ++
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  33 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 7 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..2372a1a690 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,7 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+SUBDIRS	    = brin common fdwxact gin gist hash heap index nbtree rmgrdesc \
+			  spgist table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..aacab1d729
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..00da860b31
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,233 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * FDW who implements both commit and rollback APIs can request to register the
+ * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscaches. Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Check if the local transaction has any foreign transaction.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	/* We don't support to prepare foreign transactions */
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..0a8d1da4bd 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2229,6 +2230,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
 					  : XACT_EVENT_COMMIT);
 
+	/* Commit foreign transaction if any */
+	AtEOXact_FdwXact(true);
+
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_BEFORE_LOCKS,
 						 true, true);
@@ -2368,6 +2372,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	PrePrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2755,6 +2762,9 @@ AbortTransaction(void)
 		else
 			CallXactCallbacks(XACT_EVENT_ABORT);
 
+		/* Rollback foreign transactions if any */
+		AtEOXact_FdwXact(false);
+
 		ResourceOwnerRelease(TopTransactionResourceOwner,
 							 RESOURCE_RELEASE_BEFORE_LOCKS,
 							 false, true);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..6532a836e5 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support either both APIs or neither */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..6c8b111ab5
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,33 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void PrePrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..4db7ade9a3 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -170,6 +171,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -246,6 +250,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -259,4 +267,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#204Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#203)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sun, Nov 8, 2020 at 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 5, 2020 at 12:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>

So what's your opinion?

My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others may have pointed out something else too, but I don't remember), before going deeper into the code review.

* FDW interface
New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validate the FDW interface.
What FDW function would call what XA function(s)? What should be the arguments for the FEW functions?

I guess since FDW interfaces may be affected by the feature
architecture we can discuss later.

* Performance
Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be the first release quality. I proposed the idea.
(If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want to keep Postgres's reputation.)

What is in your mind regarding the implementation of parallel prepare
and commit? Given that some FDW plugins don't support asynchronous
execution I guess we need to use parallel workers or something. That
is, the backend process launches parallel workers to
prepare/commit/rollback foreign transactions in parallel. I don't deny
this approach but it'll definitely make the feature complex and needs
more codes.

My point is a small start and keeping simple the first version. Even
if we need one or more years for this feature, I think that
introducing the simple and minimum functionality as the first version
to the core still has benefits. We will be able to have the
opportunity to get real feedback from users and to fix bugs in the
main infrastructure before making it complex. In this sense, the patch
having the backend return without waits for resolution after the local
commit would be a good start as the first version (i.g., up to
applying v26-0006 patch). Anyway, the architecture should be
extensible enough for future improvements.

For the performance improvements, we will be able to support
asynchronous and/or prepare/commit/rollback. Moreover, having multiple
resolver processes on one database would also help get better
through-put. For the user who needs much better through-put, the user
also can select not to wait for resolution after the local commit,
like synchronous_commit = ‘local’ in replication.

As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simply imagining what is typically written in database textbooks and research papers. I'm asking this because I saw some discussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL records other than prepare and commit unlike textbook implementations.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Understood. I'll add an explanation about the message flow and disk
writes to the wiki page.

Done.

We need to consider the point of error handling during resolving
foreign transactions too.

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw. Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interface and its documentation so that FDW developers can implement what we consider important -- query cancellation in your discussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago.

I suspect the story is somewhat different. libpq fortunately supports
asynchronous execution, but when it comes to canceling the foreign
transaction resolution I think basically all FDW plugins are in the
same situation at this time. We can choose whether to make it
cancellable or not. According to the discussion so far, it completely
depends on the architecture of this feature. So my point is whether
it's worth to have this functionality for users and whether users want
it, not whether postgres_fdw is ok.

I've thought again about the idea that once the backend failed to
resolve a foreign transaction it leaves to a resolver process. With
this idea, the backend process perform the 2nd phase of 2PC only once.
If an error happens during resolution it leaves to a resolver process
and returns an error to the client. We used to use this idea in the
previous patches and it’s discussed sometimes.

First of all, this idea doesn’t resolve the problem of error handling
that the transaction could return an error to the client in spite of
having been committed the local transaction. There is an argument that
this behavior could also happen even in a single server environment
but I guess the situation is slightly different. Basically what the
transaction does after the commit is cleanup. An error could happen
during cleanup but if it happens it’s likely due to a bug of
something wrong inside PostgreSQL or OS. On the other hand, during and
after resolution the transaction does major works such as connecting a
foreign server, sending an SQL, getting the result, and writing a WAL
to remove the entry. These are more likely to happen an error.

Also, with this idea, the client needs to check if the error got from
the server is really true because the local transaction might have
been committed. Although this could happen even in a single server
environment how many users check that in practice? If a server
crashes, subsequent transactions end up failing due to a network
connection error but it seems hard to distinguish between such a real
error and the fake error.

Moreover, it’s questionable in terms of extensibility. We would not
able to support keeping waiting for distributed transactions to
complete even if an error happens, like synchronous replication. The
user might want to wait in case where the failure is temporary such as
temporary network disconnection. Trying resolution only once seems to
have cons of both asynchronous and synchronous resolutions.

So I’m thinking that with this idea the user will need to change their
application so that it checks if the error they got is really true,
which is cumbersome for users. Also, it seems to me we need to
circumspectly discuss whether this idea could weaken extensibility.

Anyway, according to the discussion, it seems to me that we got a
consensus so far that the backend process prepares all foreign
transactions and a resolver process is necessary to resolve in-doubt
transaction in background. So I’ve changed the patch set as follows.
Applying these all patches, we can support asynchronous foreign
transaction resolution. That is, at transaction commit the backend
process prepares all foreign transactions, and then commit the local
transaction. After that, it returns OK of commit to the client while
leaving the prepared foreign transaction to a resolver process. A
resolver process fetches the foreign transactions to resolve and
resolves them in background. Since the 2nd phase of 2PC is performed
asynchronously a transaction that wants to see the previous
transaction result needs to check its status.

Here is brief explaination for each patches:

v27-0001-Introduce-transaction-manager-for-foreign-transa.patch

This commit adds the basic foreign transaction manager,
CommitForeignTransaction, and RollbackForeignTransaction API. These
APIs support only one-phase. With this change, FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported there is nothing the user
newly is able to do.

v27-0003-Recreate-RemoveForeignServerById.patch

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary because we need to check if there is a
foreign transaction involved with the foreign server that is about to
be removed.

v27-0004-Add-PrepareForeignTransaction-API.patch

This commit adds prepared foreign transaction support including WAL
logging and recovery, and PrepareForeignTransaction API. With this
change, the user is able to do 'PREPARE TRANSACTION’ and
'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves
foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the
local transaction. It doesn't do anything for foreign transactions.
Therefore, the user needs to resolve foreign transactions manually by
executing the pg_resolve_foreign_xacts() SQL function which is also
introduced by this commit.

v27-0005-postgres_fdw-supports-prepare-API.patch

This commit implements PrepareForeignTransaction API and makes
CommitForeignTransaction and RollbackForeignTransaction supports
two-phase commit.

v27-0006-Add-GetPrepareId-API.patch

This commit adds GetPrepareID API.

v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch

This commit introduces foreign transaction resolver and launcher
processes. With this change, the user doesn’t need to manually execute
pg_resolve_foreign_xacts() function to resolve foreign transactions
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED.
Instead, a resolver process automatically resolves them in background.

v27-0008-Prepare-foreign-transactions-at-commit-time.patch

With this commit, the transaction prepares foreign transactions marked
as modified at transaction commit if foreign_twophase_commit is
‘required’. Previously the user needs to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC
transparently to the user. But the transaction returns OK of commit to
the client after committing the local transaction and notifying the
resolver process, without waits. Foreign transactions are
asynchronously resolved by the resolver process.

v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch

With this commit, the transactions started via postgres_fdw are marked
as modified, which is necessary to use 2PC.

v27-0010-Documentation-update.patch
v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch

Documentation update and regression tests.

The missing piece from the previous version patch is synchronously
transaction resolution. In the previous patch, foreign transactions
are synchronously resolved by a resolver process. But since it's under
discussion whether this is a good approach and I'm considering
optimizing the logic it’s not included in the current patch set.

Cfbot reported an error. I've attached the updated version patch set
to make cfbot happy.

Since the previous version conflicts with the current HEAD I've
attached the rebased version patch set.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachments:

v29-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v29-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 4da8291f1b3f02ba6f83d4dfecce12fbf1759613 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v29 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 524 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1294 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index a6d2ffbf9e..106f3b2ff2 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..52e4971aed
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $wait_until) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$wait_until = 0 unless defined $wait_until;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) FROM pg_foreign_xacts",
+							$wait_until);
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..8e2a57b052
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,524 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 23d7d0beb2..d49a292cca 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2352,9 +2352,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2369,7 +2372,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90594bd41b..e46d3344e7 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v29-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/octet-stream; name=v29-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From b43ea6d849fc3c24c27eff6c4aaa544f0d7f19c6 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v29 09/11] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 7812531f46..4a4b1876e5 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -58,6 +58,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -289,6 +290,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->have_error = false;
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -303,6 +305,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -574,7 +590,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -583,6 +599,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 22e1a70e76..35642b1305 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2380,6 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3565,6 +3566,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 659222b97a..12cd55258f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v29-0006-Add-GetPrepareId-API.patchapplication/octet-stream; name=v29-0006-Add-GetPrepareId-API.patchDownload
From a0e3e92dba8fc7feed3a5e54071d7f8dfcac21d2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v29 06/11] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c | 54 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 3caf904370..7b3a2f1fba 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -143,6 +143,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -347,6 +348,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -414,9 +416,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -431,13 +434,48 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 89cec9aa96..91db4f5bfc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -174,6 +174,8 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -256,6 +258,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v29-0008-Prepare-foreign-transactions-at-commit-time.patchapplication/octet-stream; name=v29-0008-Prepare-foreign-transactions-at-commit-time.patchDownload
From 8abe8171e30f3bf7fbd21bc2cffc271d1580c649 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v29 08/11] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c          | 191 +++++++++++++++++-
 src/backend/access/transam/xact.c             |   7 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  10 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 229 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index b4cab71c3d..79bd7596a3 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -19,13 +19,27 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  The preapred
- * foreign transactions are resolved by a resolver process asynchronously.  Also, the
- * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
- * manually.
+ * local transaction but not do anything for involved foreign transactions.
  *
  * LOCKING
  *
@@ -92,8 +106,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -105,6 +121,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /* Directory where the foreign prepared transaction files will reside */
 #define FDWXACTS_DIR "pg_fdwxact"
 
@@ -142,6 +162,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -152,18 +175,24 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
@@ -182,6 +211,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -258,7 +288,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -273,6 +303,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -302,6 +333,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -348,6 +380,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -356,11 +389,139 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	return fdw_part;
 }
 
+ /*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+}
+
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	ListCell   *lc;
 
@@ -378,6 +539,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
 		Assert(fdw_part->fdwxact_id);
@@ -755,7 +919,10 @@ ForgetAllFdwXactParticipants(void)
 	int			nlefts = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -812,7 +979,10 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (!fdwxact)
 		{
-			/* Commit or rollback the foreign transaction in one-phase */
+			/*
+			 * If this participant doesn't have an FdwXact entry, it's not
+			 * prepared yet. Therefore we can commit or rollback it in one-phase.
+			 */
 			Assert(ServerSupportTransactionCallback(fdw_part));
 			FdwXactParticipantEndTransaction(fdw_part, is_commit);
 			continue;
@@ -842,6 +1012,7 @@ AtEOXact_FdwXact(bool is_commit)
 	}
 
 	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -881,7 +1052,7 @@ PrePrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * We keep FdwXactParticipants until the transaction end so that we change
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0e1bf63b52..0f223c4694 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2123,6 +2127,9 @@ CommitTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 942f6b6a43..771733862a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -499,6 +499,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4657,6 +4675,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2ed09cb347..5a73443be1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -744,6 +744,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index a3763e52c0..6bf4f5dd7d 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -20,6 +20,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -107,10 +115,12 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
+extern void PreCommit_FdwXact(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
 extern bool FdwXactIsForeignTwophaseCommitRequired(void);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 91db4f5bfc..7a444d0590 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -273,7 +273,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v29-0010-Documentation-update.patchapplication/octet-stream; name=v29-0010-Documentation-update.patchDownload
From 960e1760b941d91802e388e0007c3415bd77535d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v29 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 888 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 569841398b..6b5b287d1e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9262,6 +9262,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11115,6 +11120,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f810789ea8..23915f4315 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9270,6 +9270,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..0fbb9c4123 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1427,6 +1427,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1906,4 +2017,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a5161bb22b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 507bc1a668..d1a4dd4ca0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26169,6 +26169,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 98e1995453..d00663dc14 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1066,6 +1066,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1295,6 +1307,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1588,6 +1612,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1905,6 +1934,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.27.0

v29-0007-Introduce-foreign-transaction-launcher-and-resol.patchapplication/octet-stream; name=v29-0007-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From eddbb8faf91827d95d0bee38f2876c9800610fa9 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v29 07/11] Introduce foreign transaction launcher and resolver
 processes.

This commits introduces to new background processes: foreign
transaction launcher and resolvers. With this change, users no longer
need to use pg_resolve_foreign_xact() to resolve foreign transaction
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK
TRANSACTION. These foreign transactions are resolved in background by
foreign transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/Makefile           |   5 +-
 src/backend/access/fdwxact/fdwxact.c          |  33 +-
 src/backend/access/fdwxact/launcher.c         | 567 ++++++++++++++++++
 src/backend/access/fdwxact/resolver.c         | 352 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   6 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  63 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1183 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
index aacab1d729..151e3ae336 100644
--- a/src/backend/access/fdwxact/Makefile
+++ b/src/backend/access/fdwxact/Makefile
@@ -12,6 +12,9 @@ subdir = src/backend/access/fdwxact
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fdwxact.o
+OBJS = \
+	fdwxact.o \
+	resolver.o \
+	launcher.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 7b3a2f1fba..b4cab71c3d 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -22,10 +22,10 @@
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  To resolve
- * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
- * function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * local transaction but not do anything for involved foreign transactions.  The preapred
+ * foreign transactions are resolved by a resolver process asynchronously.  Also, the
+ * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
+ * manually.
  *
  * LOCKING
  *
@@ -76,7 +76,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -157,6 +160,7 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
@@ -165,7 +169,6 @@ static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
 										 FdwXactParticipant *fdw_part);
-static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
 static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
@@ -772,12 +775,13 @@ ForgetAllFdwXactParticipants(void)
 
 	/*
 	 * If we leave any FdwXact entries, update the oldest local transaction of
-	 * unresolved distributed transaction.
+	 * unresolved distributed transaction and notify the launcher.
 	 */
 	if (nlefts > 0)
 	{
 		elog(DEBUG1, "left %u foreign transactions", nlefts);
 		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
 	}
 
 	list_free_deep(FdwXactParticipants);
@@ -785,7 +789,9 @@ ForgetAllFdwXactParticipants(void)
 }
 
 /*
- * Commit or rollback all foreign transactions.
+ * Close in-progress involved foreign transactions.  We don't perform the second
+ * phase of two-phase commit protocol here.  All prepared foreign transactions
+ * enter in-doubt state and a resolver process will process them.
  */
 void
 AtEOXact_FdwXact(bool is_commit)
@@ -889,7 +895,7 @@ PrePrepare_FdwXact(void)
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
+void
 FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
 	for (int i = 0; i < nfdwxacts; i++)
@@ -924,6 +930,17 @@ FdwXactExists(Oid dbid, Oid serverid, Oid userid)
 
 	return (idx >= 0);
 }
+bool
+FdwXactExistsXid(TransactionId xid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(InvalidOid, xid, InvalidOid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
 
 /*
  * Return the index of first found FdwXact entry that matched to given arguments.
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..916b9af2f7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..c9d41428fc
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	fdwxact_resolver_detach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 265b03ba5a..29f11fb779 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2286,6 +2288,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2345,6 +2354,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..b2384f9ab9 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b05d7d2ff..13830cc51e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3809,6 +3809,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 098f79f3d6..13c13b45c4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #endif
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -910,6 +911,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -975,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d7191d3cd..271fd35884 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index dc29a7ea6f..9327394013 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7c5f7c775b..d8749d1e9f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3054,6 +3056,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ec6cef8ad7..942f6b6a43 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -760,6 +760,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2469,6 +2473,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 863e8ccc3a..2ed09cb347 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -733,6 +733,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 9ba819e9d1..a3763e52c0 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -104,13 +104,19 @@ typedef struct FdwXactRslvState
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern bool FdwXactExistsXid(TransactionId xid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
 								Oid userid, void *content, int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 70f61e4a31..a7bcbf15a0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6164,6 +6164,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a61a08c5d6..0967c09f3c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -877,6 +877,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 04431d0eb2..a00ca73355 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v29-0005-postgres_fdw-supports-prepare-API.patchapplication/octet-stream; name=v29-0005-postgres_fdw-supports-prepare-API.patchDownload
From 7b9a6bf1a2485ca2d75cca50337ec1dc2cd05707 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v29 05/11] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 137 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 38614812ce..7812531f46 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -96,6 +96,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1153,12 +1155,19 @@ void
 postgresCommitForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1204,16 +1213,24 @@ void
 postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1290,6 +1307,46 @@ postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1316,3 +1373,75 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index fefb7e6de2..a750ace025 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8974,19 +8974,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 7ac0c85dd3..22e1a70e76 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -563,6 +563,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index e3b2897495..659222b97a 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
 extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..ece57de1b1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2647,13 +2647,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v29-0003-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v29-0003-Recreate-RemoveForeignServerById.patchDownload
From f310c9865000c6726baca7e7945a82bc61a4fc05 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v29 03/11] Recreate RemoveForeignServerById()

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary for follow up commit that checks if the
foreign server has prepared transaction or not when removing.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 245c2f4fc8..3f97733656 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1549,6 +1549,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1563,7 +1567,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 7a079ef07f..737a14a22a 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.27.0

v29-0004-Add-PrepareForeignTransaction-API.patchapplication/octet-stream; name=v29-0004-Add-PrepareForeignTransaction-API.patchDownload
From 58730e8350913dabdd7b396257a572510d9facdc Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v29 04/11] Add PrepareForeignTransaction API.

This commits add a new FDW API, PrepareForeignTransaction. Using this
API, the transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are not neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit also adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/fdwxact/fdwxact.c          | 1755 ++++++++++++++++-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    1 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/postmaster/postmaster.c           |    1 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   56 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    2 +
 src/test/regress/expected/rules.out           |    7 +
 35 files changed, 2164 insertions(+), 28 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c5badd9c0a..fefb7e6de2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 00da860b31..3caf904370 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -9,8 +9,59 @@
  * FDW who implements both commit and rollback APIs can request to register the
  * foreign transaction by FdwXactRegisterXact() to participate it to a
  * group of distributed tranasction.  The registered foreign transactions are
- * identified by OIDs of server and user.  On commit and rollback, the global
- * transaction manager calls corresponding FDW API to end the tranasctions.
+ * identified by OIDs of server and user.  On commit, rollback and prepare, the
+ * global transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having been
+ * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
+ * local transaction but not do anything for involved foreign transactions.  To resolve
+ * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
+ * function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.	 To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *	 status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the entry from
+ *	 being updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
  *
  * Portions Copyright (c) 2020, PostgreSQL Global Development Group
  *
@@ -20,15 +71,53 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -37,13 +126,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -52,11 +151,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -82,6 +273,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/*
@@ -142,14 +340,336 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants != NIL);
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -162,6 +682,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
 	state.flags = FDWXACT_FLAG_ONEPHASE;
 
 	if (commit)
@@ -181,14 +702,46 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nlefts = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/* Unlock the foreign transaction entry */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+		nlefts++;
+	}
+
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -211,23 +764,1203 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
+
+		/*
+		 * This foreign transaction might have been prepared.  In commit case,
+		 * we don't need to anything for this participant because all foreign
+		 * transactions should have already been prepared and therefore the
+		 * transaction already closed. These will be resolved manually.  On the
+		 * other hand in abort case, we need to close the transaction if
+		 * preparing might be in-progress, since an error might have occurred
+		 * on preparing a foreign transaction.
+		 */
+		if (!is_commit)
+		{
+			int					   status;
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * Check if the local transaction has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
  */
 void
 PrePrepare_FdwXact(void)
 {
-	/* We don't support to prepare foreign transactions */
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * We keep FdwXactParticipants until the transaction end so that we change
+	 * the involved foreign transactions to ABORTING in case of failure.
+	 */
+}
+
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+static void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		FdwXactResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare the resolution state to pass to API */
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->fdwxact_id);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->fdwxact_id)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->fdwxact_id),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("permission denied to remove prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction"))));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4b3e67eb49 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 873bf9bad9..265b03ba5a 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bc96512d35..0e1bf63b52 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2568,6 +2568,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 13f1d8c3dc..074d939b1a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4602,6 +4603,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6286,6 +6288,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6833,14 +6838,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7042,7 +7048,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7554,11 +7563,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7886,8 +7897,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9179,6 +9194,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9721,6 +9737,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9740,6 +9757,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9758,6 +9776,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -9965,6 +9984,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10168,6 +10188,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2e4aa1c4b6..42c64beac9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..c290b9ea94 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1409,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 6532a836e5..d34e26fd26 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e76e627c6b..4b05d7d2ff 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4137,6 +4137,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b7799ed1d2..098f79f3d6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..23ae805218 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..2d7191d3cd 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 94edb24b22..ca9e1d13b2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1711,6 +1717,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1838,6 +1845,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1895,6 +1908,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3799,6 +3815,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..dc29a7ea6f 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..ec6cef8ad7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2458,6 +2459,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..863e8ccc3a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ee3bfa82f4..eae52defba 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -204,6 +204,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index cb6ef19182..1712b794c3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 6c8b111ab5..9ba819e9d1 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,24 +10,112 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactRslvState
 {
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactRslvState;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..e1b09a70d2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..ed6372d2e6 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e7fbda9f81..70f61e4a31 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6027,6 +6027,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4db7ade9a3..89cec9aa96 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -171,6 +171,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
 
@@ -254,6 +255,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 257e515bfe..a61a08c5d6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1004,6 +1004,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..0124c8c687 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -91,5 +91,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 097ff5d111..64da3b40d7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v29-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v29-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From a26706a4fe60fa4cb5a774741cbf3301d5743142 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v29 02/11] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 475 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 241 insertions(+), 243 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index ab3226287d..38614812ce 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,6 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -80,8 +81,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -94,6 +94,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -108,56 +110,11 @@ static bool UserMappingPasswordRequired(UserMapping *user);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		MemSet(&ctl, 0, sizeof(ctl));
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		/* allocate ConnectionHash in the cache context */
-		ctl.hcxt = CacheMemoryContext;
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -189,7 +146,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -250,7 +207,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -259,6 +216,60 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		MemSet(&ctl, 0, sizeof(ctl));
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		/* allocate ConnectionHash in the cache context */
+		ctl.hcxt = CacheMemoryContext;
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Set flag that we did GetConnection during the current transaction */
+	xact_got_connection = true;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -548,7 +559,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -560,6 +571,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -775,197 +789,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1325,3 +1148,171 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..c5badd9c0a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..7ac0c85dd3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -560,6 +560,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..e3b2897495 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v29-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v29-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From a1b28ce22fa02db2d35668c4ba86f31bd540cd81 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v29 01/11] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile          |   4 +-
 src/backend/access/fdwxact/Makefile  |  17 ++
 src/backend/access/fdwxact/fdwxact.c | 233 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  10 ++
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  33 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 7 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..2372a1a690 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,7 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+SUBDIRS	    = brin common fdwxact gin gist hash heap index nbtree rmgrdesc \
+			  spgist table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..aacab1d729
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..00da860b31
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,233 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * FDW who implements both commit and rollback APIs can request to register the
+ * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscaches. Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Check if the local transaction has any foreign transaction.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	/* We don't support to prepare foreign transactions */
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..bc96512d35 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2230,6 +2231,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
 					  : XACT_EVENT_COMMIT);
 
+	/* Commit foreign transaction if any */
+	AtEOXact_FdwXact(true);
+
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_BEFORE_LOCKS,
 						 true, true);
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	PrePrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2756,6 +2763,9 @@ AbortTransaction(void)
 		else
 			CallXactCallbacks(XACT_EVENT_ABORT);
 
+		/* Rollback foreign transactions if any */
+		AtEOXact_FdwXact(false);
+
 		ResourceOwnerRelease(TopTransactionResourceOwner,
 							 RESOURCE_RELEASE_BEFORE_LOCKS,
 							 false, true);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 61e48ca3f8..6532a836e5 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support either both APIs or neither */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..6c8b111ab5
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,33 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void PrePrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..4db7ade9a3 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -170,6 +171,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -246,6 +250,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -259,4 +267,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#205Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#204)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Since the previous version conflicts with the current HEAD I've
attached the rebased version patch set.

Rebased the patch set again to the current HEAD.

The discussion of this patch is very long so here is a short summary
of the current state:

It’s still under discussion which approaches are the best for the
distributed transaction commit as a building block of built-in sharing
using foreign data wrappers.

Since we’re considering that we use this feature for built-in
sharding, the design depends on the architecture of built-in sharding.
For example, with the current patch, the PostgreSQL node that received
a COMMIT from the client works as a coordinator and it commits the
transactions using 2PC on all foreign servers involved with the
transaction. This approach would be good with the de-centralized
sharding architecture but not with centralized architecture like the
GTM node of Postgres-XC and Postgres-XL that is a dedicated component
that is responsible for transaction management. Since we don't get a
consensus on the built-in sharding architecture yet, it's still an
open question that this patch's approach is really good as a building
block of the built-in sharding.

On the other hand, this feature is not necessarily dedicated to the
built-in sharding. For example, the distributed transaction commit
through FDW is important also when atomically moving data between two
servers via FDWs. Using a dedicated process or server like GTM could
be an over solution. Having the node that received a COMMIT work as a
coordinator would be better and straight forward.

There is no noticeable TODO in the functionality so far covered by
this patch set. This patchset adds new FDW APIs to support 2PC,
introduces the global transaction manager, and implement those FDW
APIs to postgres_fdw. Also, it has regression tests and documentation.
Transactions on foreign servers involved with the distributed
transaction are committed using 2PC. Committing using 2PC is performed
asynchronously and transparently to the user. Therefore, it doesn’t
guarantee that transactions on the foreign server are also committed
when the client gets an acknowledgment of COMMIT. The patch doesn't
cover synchronous foreign transaction commit via 2PC is not covered by
this patch as we still need a discussion on the design.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachments:

v30-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v30-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 056e54d8d2411957780f1aa9397a04a0e025c8e4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v30 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 524 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1294 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index a6d2ffbf9e..106f3b2ff2 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..52e4971aed
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $wait_until) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$wait_until = 0 unless defined $wait_until;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) FROM pg_foreign_xacts",
+							$wait_until);
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..8e2a57b052
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,524 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 23d7d0beb2..d49a292cca 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2352,9 +2352,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2369,7 +2372,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 7f014a12c9..c70e805116 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v30-0008-Prepare-foreign-transactions-at-commit-time.patchapplication/octet-stream; name=v30-0008-Prepare-foreign-transactions-at-commit-time.patchDownload
From afc603d9da7d13f1ee279800c91b9b98e0e2ab9b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v30 08/11] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c          | 191 +++++++++++++++++-
 src/backend/access/transam/xact.c             |   7 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  10 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 229 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index b4cab71c3d..79bd7596a3 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -19,13 +19,27 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  The preapred
- * foreign transactions are resolved by a resolver process asynchronously.  Also, the
- * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
- * manually.
+ * local transaction but not do anything for involved foreign transactions.
  *
  * LOCKING
  *
@@ -92,8 +106,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -105,6 +121,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /* Directory where the foreign prepared transaction files will reside */
 #define FDWXACTS_DIR "pg_fdwxact"
 
@@ -142,6 +162,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -152,18 +175,24 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
@@ -182,6 +211,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -258,7 +288,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -273,6 +303,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -302,6 +333,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -348,6 +380,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -356,11 +389,139 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	return fdw_part;
 }
 
+ /*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+}
+
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	ListCell   *lc;
 
@@ -378,6 +539,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
 		Assert(fdw_part->fdwxact_id);
@@ -755,7 +919,10 @@ ForgetAllFdwXactParticipants(void)
 	int			nlefts = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -812,7 +979,10 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (!fdwxact)
 		{
-			/* Commit or rollback the foreign transaction in one-phase */
+			/*
+			 * If this participant doesn't have an FdwXact entry, it's not
+			 * prepared yet. Therefore we can commit or rollback it in one-phase.
+			 */
 			Assert(ServerSupportTransactionCallback(fdw_part));
 			FdwXactParticipantEndTransaction(fdw_part, is_commit);
 			continue;
@@ -842,6 +1012,7 @@ AtEOXact_FdwXact(bool is_commit)
 	}
 
 	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -881,7 +1052,7 @@ PrePrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * We keep FdwXactParticipants until the transaction end so that we change
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0e1bf63b52..0f223c4694 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2123,6 +2127,9 @@ CommitTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 33e1b5884c..fc3a23fa01 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -499,6 +499,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4647,6 +4665,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 36abadbc60..e9bddbd7ee 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -743,6 +743,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index a3763e52c0..6bf4f5dd7d 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -20,6 +20,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -107,10 +115,12 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
+extern void PreCommit_FdwXact(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
 extern bool FdwXactIsForeignTwophaseCommitRequired(void);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 91db4f5bfc..7a444d0590 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -273,7 +273,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v30-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/octet-stream; name=v30-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From 3a3b65c24f10d9b77981a5993c4c1b208fdca165 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v30 09/11] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 57b2b433f9..43cc2b2462 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -58,6 +58,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -285,6 +286,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->have_error = false;
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -299,6 +301,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -570,7 +586,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -579,6 +595,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 22e1a70e76..35642b1305 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2380,6 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3565,6 +3566,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 659222b97a..12cd55258f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v30-0010-Documentation-update.patchapplication/octet-stream; name=v30-0010-Documentation-update.patchDownload
From 2a13439085b357a2276b2227afb1cdf27741acd2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v30 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 888 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..0f73bf19f4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9285,6 +9285,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11138,6 +11143,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4b60382778..d2c0fa7711 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9271,6 +9271,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..0fbb9c4123 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1427,6 +1427,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1906,4 +2017,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a5161bb22b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 93d17e4b55..21dc58da44 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26824,6 +26824,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c901306..a73b71787f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1066,6 +1066,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1295,6 +1307,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1588,6 +1612,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1905,6 +1934,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.27.0

v30-0006-Add-GetPrepareId-API.patchapplication/octet-stream; name=v30-0006-Add-GetPrepareId-API.patchDownload
From f6ebf7ef516880991b8a3f6d0ff1d74142e6bae5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v30 06/11] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c | 54 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 3caf904370..7b3a2f1fba 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -143,6 +143,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -347,6 +348,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -414,9 +416,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -431,13 +434,48 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 89cec9aa96..91db4f5bfc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -174,6 +174,8 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -256,6 +258,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v30-0003-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v30-0003-Recreate-RemoveForeignServerById.patchDownload
From 4110a07c3b6a0f83431c8a9848522e85654394f3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v30 03/11] Recreate RemoveForeignServerById()

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary for follow up commit that checks if the
foreign server has prepared transaction or not when removing.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 119006159b..e97870ce8c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1549,6 +1549,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1563,7 +1567,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 1133ae1143..02449ef7ed 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -129,6 +129,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.27.0

v30-0005-postgres_fdw-supports-prepare-API.patchapplication/octet-stream; name=v30-0005-postgres_fdw-supports-prepare-API.patchDownload
From aa1aee67249f4b61a2f2a502bd30b21681b7abf3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v30 05/11] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 137 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 23cee15bdc..57b2b433f9 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -96,6 +96,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1149,12 +1151,19 @@ void
 postgresCommitForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1200,16 +1209,24 @@ void
 postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1286,6 +1303,46 @@ postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1312,3 +1369,75 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index fefb7e6de2..a750ace025 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8974,19 +8974,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 7ac0c85dd3..22e1a70e76 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -563,6 +563,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index e3b2897495..659222b97a 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
 extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..ece57de1b1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2647,13 +2647,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v30-0007-Introduce-foreign-transaction-launcher-and-resol.patchapplication/octet-stream; name=v30-0007-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From 2360da53bb88e2e14047341c78e4c7a2e35b9119 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v30 07/11] Introduce foreign transaction launcher and resolver
 processes.

This commits introduces to new background processes: foreign
transaction launcher and resolvers. With this change, users no longer
need to use pg_resolve_foreign_xact() to resolve foreign transaction
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK
TRANSACTION. These foreign transactions are resolved in background by
foreign transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/Makefile           |   5 +-
 src/backend/access/fdwxact/fdwxact.c          |  33 +-
 src/backend/access/fdwxact/launcher.c         | 567 ++++++++++++++++++
 src/backend/access/fdwxact/resolver.c         | 352 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   6 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  63 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1183 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
index aacab1d729..151e3ae336 100644
--- a/src/backend/access/fdwxact/Makefile
+++ b/src/backend/access/fdwxact/Makefile
@@ -12,6 +12,9 @@ subdir = src/backend/access/fdwxact
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fdwxact.o
+OBJS = \
+	fdwxact.o \
+	resolver.o \
+	launcher.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 7b3a2f1fba..b4cab71c3d 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -22,10 +22,10 @@
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  To resolve
- * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
- * function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * local transaction but not do anything for involved foreign transactions.  The preapred
+ * foreign transactions are resolved by a resolver process asynchronously.  Also, the
+ * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
+ * manually.
  *
  * LOCKING
  *
@@ -76,7 +76,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -157,6 +160,7 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
@@ -165,7 +169,6 @@ static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
 										 FdwXactParticipant *fdw_part);
-static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
 static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
@@ -772,12 +775,13 @@ ForgetAllFdwXactParticipants(void)
 
 	/*
 	 * If we leave any FdwXact entries, update the oldest local transaction of
-	 * unresolved distributed transaction.
+	 * unresolved distributed transaction and notify the launcher.
 	 */
 	if (nlefts > 0)
 	{
 		elog(DEBUG1, "left %u foreign transactions", nlefts);
 		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
 	}
 
 	list_free_deep(FdwXactParticipants);
@@ -785,7 +789,9 @@ ForgetAllFdwXactParticipants(void)
 }
 
 /*
- * Commit or rollback all foreign transactions.
+ * Close in-progress involved foreign transactions.  We don't perform the second
+ * phase of two-phase commit protocol here.  All prepared foreign transactions
+ * enter in-doubt state and a resolver process will process them.
  */
 void
 AtEOXact_FdwXact(bool is_commit)
@@ -889,7 +895,7 @@ PrePrepare_FdwXact(void)
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
+void
 FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
 	for (int i = 0; i < nfdwxacts; i++)
@@ -924,6 +930,17 @@ FdwXactExists(Oid dbid, Oid serverid, Oid userid)
 
 	return (idx >= 0);
 }
+bool
+FdwXactExistsXid(TransactionId xid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(InvalidOid, xid, InvalidOid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
 
 /*
  * Return the index of first found FdwXact entry that matched to given arguments.
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..916b9af2f7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..c9d41428fc
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	fdwxact_resolver_detach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 265b03ba5a..29f11fb779 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2286,6 +2288,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2345,6 +2354,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d209b69ec0..ee87e4a847 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 757e9dad83..d8dbf473c9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3824,6 +3824,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ea00da45d8..1a8e9148c3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #endif
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -911,6 +912,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -976,12 +980,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d7191d3cd..271fd35884 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index dc29a7ea6f..9327394013 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d35c5020ea..96169a28a1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3073,6 +3075,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7ef7eef1b5..33e1b5884c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -760,6 +760,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2459,6 +2463,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 13e7027fd4..36abadbc60 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -732,6 +732,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 9ba819e9d1..a3763e52c0 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -104,13 +104,19 @@ typedef struct FdwXactRslvState
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern bool FdwXactExistsXid(TransactionId xid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
 								Oid userid, void *content, int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 755c722e33..684f9ab4c4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6165,6 +6165,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1ed1987fa5..382913a790 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -883,6 +883,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 7f36e1146f..cf2170cf5f 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v30-0004-Add-PrepareForeignTransaction-API.patchapplication/octet-stream; name=v30-0004-Add-PrepareForeignTransaction-API.patchDownload
From 53d7852e3415bb525a7ea93e1744a815e7e27e7c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v30 04/11] Add PrepareForeignTransaction API.

This commits add a new FDW API, PrepareForeignTransaction. Using this
API, the transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are not neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit also adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/fdwxact/fdwxact.c          | 1755 ++++++++++++++++-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    1 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/postmaster/postmaster.c           |    1 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   56 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    2 +
 src/test/regress/expected/rules.out           |    7 +
 35 files changed, 2164 insertions(+), 28 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c5badd9c0a..fefb7e6de2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 00da860b31..3caf904370 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -9,8 +9,59 @@
  * FDW who implements both commit and rollback APIs can request to register the
  * foreign transaction by FdwXactRegisterXact() to participate it to a
  * group of distributed tranasction.  The registered foreign transactions are
- * identified by OIDs of server and user.  On commit and rollback, the global
- * transaction manager calls corresponding FDW API to end the tranasctions.
+ * identified by OIDs of server and user.  On commit, rollback and prepare, the
+ * global transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having been
+ * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
+ * local transaction but not do anything for involved foreign transactions.  To resolve
+ * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
+ * function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.	 To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *	 status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the entry from
+ *	 being updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
  *
  * Portions Copyright (c) 2020, PostgreSQL Global Development Group
  *
@@ -20,15 +71,53 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -37,13 +126,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -52,11 +151,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -82,6 +273,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/*
@@ -142,14 +340,336 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants != NIL);
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -162,6 +682,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
 	state.flags = FDWXACT_FLAG_ONEPHASE;
 
 	if (commit)
@@ -181,14 +702,46 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nlefts = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/* Unlock the foreign transaction entry */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+		nlefts++;
+	}
+
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -211,23 +764,1203 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
+
+		/*
+		 * This foreign transaction might have been prepared.  In commit case,
+		 * we don't need to anything for this participant because all foreign
+		 * transactions should have already been prepared and therefore the
+		 * transaction already closed. These will be resolved manually.  On the
+		 * other hand in abort case, we need to close the transaction if
+		 * preparing might be in-progress, since an error might have occurred
+		 * on preparing a foreign transaction.
+		 */
+		if (!is_commit)
+		{
+			int					   status;
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * Check if the local transaction has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
  */
 void
 PrePrepare_FdwXact(void)
 {
-	/* We don't support to prepare foreign transactions */
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * We keep FdwXactParticipants until the transaction end so that we change
+	 * the involved foreign transactions to ABORTING in case of failure.
+	 */
+}
+
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+static void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		FdwXactResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare the resolution state to pass to API */
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->fdwxact_id);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->fdwxact_id)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->fdwxact_id),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("permission denied to remove prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction"))));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4b3e67eb49 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 873bf9bad9..265b03ba5a 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bc96512d35..0e1bf63b52 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2568,6 +2568,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9867e1b403..634c708661 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4613,6 +4614,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6298,6 +6300,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6845,14 +6850,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7054,7 +7060,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7566,11 +7575,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7898,8 +7909,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9265,6 +9280,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9813,6 +9829,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9832,6 +9849,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9850,6 +9868,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -10057,6 +10076,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10260,6 +10280,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b140c210bc..15e567dc3c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..c290b9ea94 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1409,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 9c8b1c7fc2..7b1ce752f8 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d87d9d06ee..757e9dad83 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4152,6 +4152,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index fff4227e0b..ea00da45d8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..23ae805218 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..2d7191d3cd 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ee912b9d5e..551e212f4d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1709,6 +1715,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1836,6 +1843,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1893,6 +1906,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3797,6 +3813,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..dc29a7ea6f 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 878fcc2236..7ef7eef1b5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2448,6 +2449,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b7fb2ec1fe..13e7027fd4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f994c4216b..41c9544c2e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -204,6 +204,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index cb6ef19182..1712b794c3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 6c8b111ab5..9ba819e9d1 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,24 +10,112 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactRslvState
 {
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactRslvState;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..e1b09a70d2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..ed6372d2e6 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22970f46cd..755c722e33 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6028,6 +6028,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4db7ade9a3..89cec9aa96 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -171,6 +171,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
 
@@ -254,6 +255,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5954068dec..1ed1987fa5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1010,6 +1010,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..0124c8c687 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -91,5 +91,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6293ab57bc..c28b63b431 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v30-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v30-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 349a345c880438a75a79920502f8b65b53b6a8b0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v30 02/11] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 468 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 237 insertions(+), 240 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 66581e5414..23cee15bdc 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,6 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -80,8 +81,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -94,6 +94,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -108,53 +110,11 @@ static bool UserMappingPasswordRequired(UserMapping *user);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -186,7 +146,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -247,7 +207,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -256,6 +216,56 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Set flag that we did GetConnection during the current transaction */
+	xact_got_connection = true;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -545,7 +555,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -557,6 +567,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -772,197 +785,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, discard it to
-		 * recover. Next GetConnection will open a new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1322,3 +1144,171 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..c5badd9c0a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..7ac0c85dd3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -560,6 +560,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..e3b2897495 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v30-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v30-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From 18830e8382e6d4afd5abb1bbf92568271e6853c0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v30 01/11] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile          |   4 +-
 src/backend/access/fdwxact/Makefile  |  17 ++
 src/backend/access/fdwxact/fdwxact.c | 233 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  10 ++
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  33 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 7 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..2372a1a690 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,7 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+SUBDIRS	    = brin common fdwxact gin gist hash heap index nbtree rmgrdesc \
+			  spgist table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..aacab1d729
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..00da860b31
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,233 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * FDW who implements both commit and rollback APIs can request to register the
+ * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscaches. Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Check if the local transaction has any foreign transaction.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	/* We don't support to prepare foreign transactions */
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..bc96512d35 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2230,6 +2231,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
 					  : XACT_EVENT_COMMIT);
 
+	/* Commit foreign transaction if any */
+	AtEOXact_FdwXact(true);
+
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_BEFORE_LOCKS,
 						 true, true);
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	PrePrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2756,6 +2763,9 @@ AbortTransaction(void)
 		else
 			CallXactCallbacks(XACT_EVENT_ABORT);
 
+		/* Rollback foreign transactions if any */
+		AtEOXact_FdwXact(false);
+
 		ResourceOwnerRelease(TopTransactionResourceOwner,
 							 RESOURCE_RELEASE_BEFORE_LOCKS,
 							 false, true);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 3e79c852c1..9c8b1c7fc2 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support either both APIs or neither */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..6c8b111ab5
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,33 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void PrePrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..4db7ade9a3 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -170,6 +171,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -246,6 +250,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -259,4 +267,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#206Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#205)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Since the previous version conflicts with the current HEAD I've
attached the rebased version patch set.

Rebased the patch set again to the current HEAD.

The discussion of this patch is very long so here is a short summary
of the current state:

It’s still under discussion which approaches are the best for the
distributed transaction commit as a building block of built-in sharing
using foreign data wrappers.

Since we’re considering that we use this feature for built-in
sharding, the design depends on the architecture of built-in sharding.
For example, with the current patch, the PostgreSQL node that received
a COMMIT from the client works as a coordinator and it commits the
transactions using 2PC on all foreign servers involved with the
transaction. This approach would be good with the de-centralized
sharding architecture but not with centralized architecture like the
GTM node of Postgres-XC and Postgres-XL that is a dedicated component
that is responsible for transaction management. Since we don't get a
consensus on the built-in sharding architecture yet, it's still an
open question that this patch's approach is really good as a building
block of the built-in sharding.

On the other hand, this feature is not necessarily dedicated to the
built-in sharding. For example, the distributed transaction commit
through FDW is important also when atomically moving data between two
servers via FDWs. Using a dedicated process or server like GTM could
be an over solution. Having the node that received a COMMIT work as a
coordinator would be better and straight forward.

There is no noticeable TODO in the functionality so far covered by
this patch set. This patchset adds new FDW APIs to support 2PC,
introduces the global transaction manager, and implement those FDW
APIs to postgres_fdw. Also, it has regression tests and documentation.
Transactions on foreign servers involved with the distributed
transaction are committed using 2PC. Committing using 2PC is performed
asynchronously and transparently to the user. Therefore, it doesn’t
guarantee that transactions on the foreign server are also committed
when the client gets an acknowledgment of COMMIT. The patch doesn't
cover synchronous foreign transaction commit via 2PC is not covered by
this patch as we still need a discussion on the design.

I've attached the rebased patches to make cfbot happy.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachments:

v31-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/x-patch; name=v31-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 9345a7ee832d7990dae0840b404f8edf37ea76a2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v31 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 524 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1294 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index a6d2ffbf9e..106f3b2ff2 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..52e4971aed
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $wait_until) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$wait_until = 0 unless defined $wait_until;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) FROM pg_foreign_xacts",
+							$wait_until);
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..8e2a57b052
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,524 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index fa8e031526..d47d96975b 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 23d7d0beb2..d49a292cca 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2352,9 +2352,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2369,7 +2372,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 7f014a12c9..c70e805116 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v31-0005-postgres_fdw-supports-prepare-API.patchapplication/x-patch; name=v31-0005-postgres_fdw-supports-prepare-API.patchDownload
From 7ca5c22c93da5bad75a9f31579d3b59a6d1a6b08 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v31 05/11] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 137 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index be263d8e53..245b53febb 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -96,6 +96,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1166,12 +1168,19 @@ void
 postgresCommitForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1217,16 +1226,24 @@ void
 postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1303,6 +1320,46 @@ postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1329,3 +1386,75 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 707f1d7cd4..b7cae97600 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8974,19 +8974,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 7ac0c85dd3..22e1a70e76 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -563,6 +563,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index e3b2897495..659222b97a 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
 extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..666f39210f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2647,13 +2647,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v31-0008-Prepare-foreign-transactions-at-commit-time.patchapplication/x-patch; name=v31-0008-Prepare-foreign-transactions-at-commit-time.patchDownload
From e8d46a61b0559e767cb693c1b632cfc749ddd8f2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v31 08/11] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c          | 191 +++++++++++++++++-
 src/backend/access/transam/xact.c             |   7 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  10 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 229 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index b4cab71c3d..79bd7596a3 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -19,13 +19,27 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  The preapred
- * foreign transactions are resolved by a resolver process asynchronously.  Also, the
- * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
- * manually.
+ * local transaction but not do anything for involved foreign transactions.
  *
  * LOCKING
  *
@@ -92,8 +106,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -105,6 +121,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /* Directory where the foreign prepared transaction files will reside */
 #define FDWXACTS_DIR "pg_fdwxact"
 
@@ -142,6 +162,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -152,18 +175,24 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
@@ -182,6 +211,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -258,7 +288,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -273,6 +303,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -302,6 +333,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -348,6 +380,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -356,11 +389,139 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	return fdw_part;
 }
 
+ /*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+}
+
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	ListCell   *lc;
 
@@ -378,6 +539,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
 		Assert(fdw_part->fdwxact_id);
@@ -755,7 +919,10 @@ ForgetAllFdwXactParticipants(void)
 	int			nlefts = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -812,7 +979,10 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (!fdwxact)
 		{
-			/* Commit or rollback the foreign transaction in one-phase */
+			/*
+			 * If this participant doesn't have an FdwXact entry, it's not
+			 * prepared yet. Therefore we can commit or rollback it in one-phase.
+			 */
 			Assert(ServerSupportTransactionCallback(fdw_part));
 			FdwXactParticipantEndTransaction(fdw_part, is_commit);
 			continue;
@@ -842,6 +1012,7 @@ AtEOXact_FdwXact(bool is_commit)
 	}
 
 	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -881,7 +1052,7 @@ PrePrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * We keep FdwXactParticipants until the transaction end so that we change
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 0e1bf63b52..0f223c4694 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2123,6 +2127,9 @@ CommitTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 33e1b5884c..fc3a23fa01 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -499,6 +499,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4647,6 +4665,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 36abadbc60..e9bddbd7ee 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -743,6 +743,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index a3763e52c0..6bf4f5dd7d 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -20,6 +20,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -107,10 +115,12 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
+extern void PreCommit_FdwXact(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
 extern bool FdwXactIsForeignTwophaseCommitRequired(void);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 91db4f5bfc..7a444d0590 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -273,7 +273,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v31-0010-Documentation-update.patchapplication/x-patch; name=v31-0010-Documentation-update.patchDownload
From 03818d33ee851277c27dd175cd53f44c37383040 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v31 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 888 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..0f73bf19f4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9285,6 +9285,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11138,6 +11143,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 048bd6aa08..224860879a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9271,6 +9271,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..0fbb9c4123 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1427,6 +1427,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1906,4 +2017,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a5161bb22b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5021ac1ca9..736394cd96 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26838,6 +26838,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c901306..a73b71787f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1066,6 +1066,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1295,6 +1307,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1588,6 +1612,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1905,6 +1934,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.27.0

v31-0006-Add-GetPrepareId-API.patchapplication/x-patch; name=v31-0006-Add-GetPrepareId-API.patchDownload
From e9a7089c5b5aebcbc2dcfd78be3655859c91c658 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v31 06/11] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c | 54 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 3caf904370..7b3a2f1fba 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -143,6 +143,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -347,6 +348,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -414,9 +416,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -431,13 +434,48 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 89cec9aa96..91db4f5bfc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -174,6 +174,8 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -256,6 +258,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v31-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/x-patch; name=v31-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From 8a367cf53fc475c3b2bf0ff962a2576242853326 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v31 09/11] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 245b53febb..0d42474cdc 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -58,6 +58,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -285,6 +286,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->have_error = false;
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -299,6 +301,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -570,7 +586,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -579,6 +595,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 22e1a70e76..35642b1305 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2380,6 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3565,6 +3566,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 659222b97a..12cd55258f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v31-0007-Introduce-foreign-transaction-launcher-and-resol.patchapplication/x-patch; name=v31-0007-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From dc3f8a52d393b72bed950ccf6d586da1b1bfaf6d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v31 07/11] Introduce foreign transaction launcher and resolver
 processes.

This commits introduces to new background processes: foreign
transaction launcher and resolvers. With this change, users no longer
need to use pg_resolve_foreign_xact() to resolve foreign transaction
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK
TRANSACTION. These foreign transactions are resolved in background by
foreign transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/Makefile           |   5 +-
 src/backend/access/fdwxact/fdwxact.c          |  33 +-
 src/backend/access/fdwxact/launcher.c         | 567 ++++++++++++++++++
 src/backend/access/fdwxact/resolver.c         | 352 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   6 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  63 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1183 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
index aacab1d729..151e3ae336 100644
--- a/src/backend/access/fdwxact/Makefile
+++ b/src/backend/access/fdwxact/Makefile
@@ -12,6 +12,9 @@ subdir = src/backend/access/fdwxact
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fdwxact.o
+OBJS = \
+	fdwxact.o \
+	resolver.o \
+	launcher.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 7b3a2f1fba..b4cab71c3d 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -22,10 +22,10 @@
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  To resolve
- * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
- * function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * local transaction but not do anything for involved foreign transactions.  The preapred
+ * foreign transactions are resolved by a resolver process asynchronously.  Also, the
+ * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
+ * manually.
  *
  * LOCKING
  *
@@ -76,7 +76,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -157,6 +160,7 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
@@ -165,7 +169,6 @@ static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
 										 FdwXactParticipant *fdw_part);
-static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
 static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
@@ -772,12 +775,13 @@ ForgetAllFdwXactParticipants(void)
 
 	/*
 	 * If we leave any FdwXact entries, update the oldest local transaction of
-	 * unresolved distributed transaction.
+	 * unresolved distributed transaction and notify the launcher.
 	 */
 	if (nlefts > 0)
 	{
 		elog(DEBUG1, "left %u foreign transactions", nlefts);
 		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
 	}
 
 	list_free_deep(FdwXactParticipants);
@@ -785,7 +789,9 @@ ForgetAllFdwXactParticipants(void)
 }
 
 /*
- * Commit or rollback all foreign transactions.
+ * Close in-progress involved foreign transactions.  We don't perform the second
+ * phase of two-phase commit protocol here.  All prepared foreign transactions
+ * enter in-doubt state and a resolver process will process them.
  */
 void
 AtEOXact_FdwXact(bool is_commit)
@@ -889,7 +895,7 @@ PrePrepare_FdwXact(void)
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
+void
 FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
 	for (int i = 0; i < nfdwxacts; i++)
@@ -924,6 +930,17 @@ FdwXactExists(Oid dbid, Oid serverid, Oid userid)
 
 	return (idx >= 0);
 }
+bool
+FdwXactExistsXid(TransactionId xid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(InvalidOid, xid, InvalidOid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
 
 /*
  * Return the index of first found FdwXact entry that matched to given arguments.
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..916b9af2f7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..c9d41428fc
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	fdwxact_resolver_detach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 265b03ba5a..29f11fb779 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2286,6 +2288,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2345,6 +2354,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d209b69ec0..ee87e4a847 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4e53d40b13..db51f458b0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3830,6 +3830,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 4fdf015c9b..a837cb8260 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #endif
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -911,6 +912,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -976,12 +980,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d7191d3cd..271fd35884 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index dc29a7ea6f..9327394013 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 317d1aa573..7ac1488e33 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3085,6 +3087,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7ef7eef1b5..33e1b5884c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -760,6 +760,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2459,6 +2463,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 13e7027fd4..36abadbc60 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -732,6 +732,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 9ba819e9d1..a3763e52c0 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -104,13 +104,19 @@ typedef struct FdwXactRslvState
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern bool FdwXactExistsXid(TransactionId xid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
 								Oid userid, void *content, int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 71ebb034b2..091176ad72 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6165,6 +6165,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1ed1987fa5..382913a790 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -883,6 +883,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 7f36e1146f..cf2170cf5f 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v31-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/x-patch; name=v31-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 1d17b419b249c07cd280ac56e5d828d2508f5af4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v31 02/11] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 470 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 237 insertions(+), 242 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index d841cec39b..be263d8e53 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,6 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -80,8 +81,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -94,6 +94,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -108,53 +110,11 @@ static bool UserMappingPasswordRequired(UserMapping *user);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -186,7 +146,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -247,7 +207,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -256,6 +216,56 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Set flag that we did GetConnection during the current transaction */
+	xact_got_connection = true;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -545,7 +555,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -557,6 +567,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -772,199 +785,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state or it is marked as
-		 * invalid, then discard it to recover. Next GetConnection will open a
-		 * new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state ||
-			entry->invalidated)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1341,3 +1161,171 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..3724fdab3d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..7ac0c85dd3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -560,6 +560,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..e3b2897495 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v31-0003-Recreate-RemoveForeignServerById.patchapplication/x-patch; name=v31-0003-Recreate-RemoveForeignServerById.patchDownload
From ceeaaec7cc664bd2b053482d1d272cb2ca4c672d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v31 03/11] Recreate RemoveForeignServerById()

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary for follow up commit that checks if the
foreign server has prepared transaction or not when removing.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 119006159b..e97870ce8c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1549,6 +1549,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1563,7 +1567,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index de31ddd1f3..c002a61794 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 1133ae1143..02449ef7ed 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -129,6 +129,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.27.0

v31-0004-Add-PrepareForeignTransaction-API.patchapplication/x-patch; name=v31-0004-Add-PrepareForeignTransaction-API.patchDownload
From abf47a46731b61e17dea6ac0c6d00718006cfb8b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v31 04/11] Add PrepareForeignTransaction API.

This commits add a new FDW API, PrepareForeignTransaction. Using this
API, the transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are not neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit also adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/fdwxact/fdwxact.c          | 1755 ++++++++++++++++-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    1 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/postmaster/postmaster.c           |    1 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   56 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    2 +
 src/test/regress/expected/rules.out           |    7 +
 35 files changed, 2164 insertions(+), 28 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 3724fdab3d..707f1d7cd4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 00da860b31..3caf904370 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -9,8 +9,59 @@
  * FDW who implements both commit and rollback APIs can request to register the
  * foreign transaction by FdwXactRegisterXact() to participate it to a
  * group of distributed tranasction.  The registered foreign transactions are
- * identified by OIDs of server and user.  On commit and rollback, the global
- * transaction manager calls corresponding FDW API to end the tranasctions.
+ * identified by OIDs of server and user.  On commit, rollback and prepare, the
+ * global transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having been
+ * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
+ * local transaction but not do anything for involved foreign transactions.  To resolve
+ * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
+ * function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.	 To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *	 status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the entry from
+ *	 being updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
  *
  * Portions Copyright (c) 2020, PostgreSQL Global Development Group
  *
@@ -20,15 +71,53 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -37,13 +126,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -52,11 +151,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid,
+										 FdwXactParticipant *fdw_part);
+static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -82,6 +273,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/*
@@ -142,14 +340,336 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants != NIL);
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertFdwXactEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -162,6 +682,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
 	state.flags = FDWXACT_FLAG_ONEPHASE;
 
 	if (commit)
@@ -181,14 +702,46 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nlefts = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/* Unlock the foreign transaction entry */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+		nlefts++;
+	}
+
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -211,23 +764,1203 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
+
+		/*
+		 * This foreign transaction might have been prepared.  In commit case,
+		 * we don't need to anything for this participant because all foreign
+		 * transactions should have already been prepared and therefore the
+		 * transaction already closed. These will be resolved manually.  On the
+		 * other hand in abort case, we need to close the transaction if
+		 * preparing might be in-progress, since an error might have occurred
+		 * on preparing a foreign transaction.
+		 */
+		if (!is_commit)
+		{
+			int					   status;
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * Check if the local transaction has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
  */
 void
 PrePrepare_FdwXact(void)
 {
-	/* We don't support to prepare foreign transactions */
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * We keep FdwXactParticipants until the transaction end so that we change
+	 * the involved foreign transactions to ABORTING in case of failure.
+	 */
+}
+
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+static void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		FdwXactResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	else
+		elog(ERROR,
+			 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare the resolution state to pass to API */
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts <= 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->fdwxact_id);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->fdwxact_id)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->fdwxact_id),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("permission denied to remove prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction"))));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4b3e67eb49 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 873bf9bad9..265b03ba5a 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bc96512d35..0e1bf63b52 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2568,6 +2568,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9867e1b403..634c708661 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4613,6 +4614,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6298,6 +6300,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6845,14 +6850,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7054,7 +7060,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7566,11 +7575,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7898,8 +7909,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9265,6 +9280,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9813,6 +9829,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9832,6 +9849,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9850,6 +9868,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -10057,6 +10076,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10260,6 +10280,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b140c210bc..15e567dc3c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index c002a61794..c290b9ea94 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1409,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 9c8b1c7fc2..7b1ce752f8 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 123369f4fa..4e53d40b13 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4158,6 +4158,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b3ccd18cda..4fdf015c9b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..23ae805218 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -167,6 +167,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..2d7191d3cd 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ee912b9d5e..551e212f4d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1709,6 +1715,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1836,6 +1843,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1893,6 +1906,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3797,6 +3813,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..dc29a7ea6f 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 878fcc2236..7ef7eef1b5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2448,6 +2449,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b7fb2ec1fe..13e7027fd4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f994c4216b..41c9544c2e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -204,6 +204,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index cb6ef19182..1712b794c3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 6c8b111ab5..9ba819e9d1 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,24 +10,112 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactRslvState
 {
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactRslvState;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index 6c15df7e70..986bc73566 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3445..bd027a2861 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..e1b09a70d2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..ed6372d2e6 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 139f4a08bd..71ebb034b2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6028,6 +6028,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4db7ade9a3..89cec9aa96 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -171,6 +171,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
 
@@ -254,6 +255,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5954068dec..1ed1987fa5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1010,6 +1010,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..0124c8c687 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -91,5 +91,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6293ab57bc..c28b63b431 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v31-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/x-patch; name=v31-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From 1398b90f3c7687e5866a66d202408c3ab34482cd Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v31 01/11] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile          |   4 +-
 src/backend/access/fdwxact/Makefile  |  17 ++
 src/backend/access/fdwxact/fdwxact.c | 233 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  10 ++
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  33 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 7 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..2372a1a690 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,7 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+SUBDIRS	    = brin common fdwxact gin gist hash heap index nbtree rmgrdesc \
+			  spgist table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..aacab1d729
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..00da860b31
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,233 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * FDW who implements both commit and rollback APIs can request to register the
+ * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscaches. Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Check if the local transaction has any foreign transaction.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	/* We don't support to prepare foreign transactions */
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 9cd0b7c11b..bc96512d35 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2230,6 +2231,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
 					  : XACT_EVENT_COMMIT);
 
+	/* Commit foreign transaction if any */
+	AtEOXact_FdwXact(true);
+
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_BEFORE_LOCKS,
 						 true, true);
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	PrePrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2756,6 +2763,9 @@ AbortTransaction(void)
 		else
 			CallXactCallbacks(XACT_EVENT_ABORT);
 
+		/* Rollback foreign transactions if any */
+		AtEOXact_FdwXact(false);
+
 		ResourceOwnerRelease(TopTransactionResourceOwner,
 							 RESOURCE_RELEASE_BEFORE_LOCKS,
 							 false, true);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 3e79c852c1..9c8b1c7fc2 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support either both APIs or neither */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..6c8b111ab5
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,33 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void PrePrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..4db7ade9a3 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -170,6 +171,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -246,6 +250,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -259,4 +267,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#207Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#206)
Re: Transactions involving multiple postgres foreign servers, take 2

Hi,
For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :

However these functions are not neither committed nor aborted at

I think the double negation was not intentional. Should be 'are neither ...'

For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
return statement ?

+ fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);

For the function name, Fdw and Xact appear twice, each. Maybe one of them
can be dropped ?

+ * we don't need to anything for this participant because all
foreign

'need to' -> 'need to do'

+   else if (TransactionIdDidAbort(xid))
+       return FDWXACT_STATUS_ABORTING;
+
the 'else' can be omitted since the preceding if would return.

+ if (max_prepared_foreign_xacts <= 0)

I wonder when the value for max_prepared_foreign_xacts would be negative
(and whether that should be considered an error).

Cheers

On Wed, Jan 6, 2021 at 5:45 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Show quoted text

On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

Since the previous version conflicts with the current HEAD I've
attached the rebased version patch set.

Rebased the patch set again to the current HEAD.

The discussion of this patch is very long so here is a short summary
of the current state:

It’s still under discussion which approaches are the best for the
distributed transaction commit as a building block of built-in sharing
using foreign data wrappers.

Since we’re considering that we use this feature for built-in
sharding, the design depends on the architecture of built-in sharding.
For example, with the current patch, the PostgreSQL node that received
a COMMIT from the client works as a coordinator and it commits the
transactions using 2PC on all foreign servers involved with the
transaction. This approach would be good with the de-centralized
sharding architecture but not with centralized architecture like the
GTM node of Postgres-XC and Postgres-XL that is a dedicated component
that is responsible for transaction management. Since we don't get a
consensus on the built-in sharding architecture yet, it's still an
open question that this patch's approach is really good as a building
block of the built-in sharding.

On the other hand, this feature is not necessarily dedicated to the
built-in sharding. For example, the distributed transaction commit
through FDW is important also when atomically moving data between two
servers via FDWs. Using a dedicated process or server like GTM could
be an over solution. Having the node that received a COMMIT work as a
coordinator would be better and straight forward.

There is no noticeable TODO in the functionality so far covered by
this patch set. This patchset adds new FDW APIs to support 2PC,
introduces the global transaction manager, and implement those FDW
APIs to postgres_fdw. Also, it has regression tests and documentation.
Transactions on foreign servers involved with the distributed
transaction are committed using 2PC. Committing using 2PC is performed
asynchronously and transparently to the user. Therefore, it doesn’t
guarantee that transactions on the foreign server are also committed
when the client gets an acknowledgment of COMMIT. The patch doesn't
cover synchronous foreign transaction commit via 2PC is not covered by
this patch as we still need a discussion on the design.

I've attached the rebased patches to make cfbot happy.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#208Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhihong Yu (#207)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,

Thank you for reviewing the patch!

For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :

However these functions are not neither committed nor aborted at

I think the double negation was not intentional. Should be 'are neither ...'

Fixed.

For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the return statement ?

Hmm, you mean that we need MAXALIGN(size) after adding the size of
FdwXactData structs?

Size
FdwXactShmemSize(void)
{
Size size;

/* Size for foreign transaction information array */
size = offsetof(FdwXactCtlData, fdwxacts);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXact)));
size = MAXALIGN(size);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXactData)));

return size;
}

I don't think we need to do that. Looking at other similar code such
as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?

+ fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);

For the function name, Fdw and Xact appear twice, each. Maybe one of them can be dropped ?

Agreed. Changed to FdwXactInsertEntry().

+ * we don't need to anything for this participant because all foreign

'need to' -> 'need to do'

Fixed.

+   else if (TransactionIdDidAbort(xid))
+       return FDWXACT_STATUS_ABORTING;
+
the 'else' can be omitted since the preceding if would return.

Fixed.

+ if (max_prepared_foreign_xacts <= 0)

I wonder when the value for max_prepared_foreign_xacts would be negative (and whether that should be considered an error).

Fixed to (max_prepared_foreign_xacts == 0)

Attached the updated version patch set.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachments:

v32-0010-Documentation-update.patchapplication/octet-stream; name=v32-0010-Documentation-update.patchDownload
From 9f26bbdc7484377c287599c6c9f61614eafa76ab Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v32 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 9 files changed, 888 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..0f73bf19f4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9285,6 +9285,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11138,6 +11143,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..032801658c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9336,6 +9336,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..0fbb9c4123 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1427,6 +1427,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactRslvState *frstate);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1906,4 +2017,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactRslvState</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a5161bb22b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fd0370a1b4..2d935fbe59 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26865,6 +26865,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3cdb1aff3c..f7497debf2 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1072,6 +1072,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1301,6 +1313,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1594,6 +1618,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1911,6 +1940,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
-- 
2.27.0

v32-0004-Add-PrepareForeignTransaction-API.patchapplication/octet-stream; name=v32-0004-Add-PrepareForeignTransaction-API.patchDownload
From 5be5729cdf9481051e9e95db10d9321735139b32 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v32 04/11] Add PrepareForeignTransaction API.

This commits add a new FDW API, PrepareForeignTransaction. Using this
API, the transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit also adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/fdwxact/fdwxact.c          | 1754 ++++++++++++++++-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    1 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/postmaster/postmaster.c           |    1 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   56 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    2 +
 src/test/regress/expected/rules.out           |    7 +
 35 files changed, 2163 insertions(+), 28 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 3724fdab3d..707f1d7cd4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 00da860b31..cbbd53dc7d 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -9,8 +9,59 @@
  * FDW who implements both commit and rollback APIs can request to register the
  * foreign transaction by FdwXactRegisterXact() to participate it to a
  * group of distributed tranasction.  The registered foreign transactions are
- * identified by OIDs of server and user.  On commit and rollback, the global
- * transaction manager calls corresponding FDW API to end the tranasctions.
+ * identified by OIDs of server and user.  On commit, rollback and prepare, the
+ * global transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having been
+ * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
+ * local transaction but not do anything for involved foreign transactions.  To resolve
+ * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
+ * function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.	 To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock. The
+ *	 status is protected by its mutex.
+ * * A process who is going to process foreign transaction needs to set
+ *   locking_backend of the FdwXact entry to lock the entry, which prevents the entry from
+ *	 being updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
  *
  * Portions Copyright (c) 2020, PostgreSQL Global Development Group
  *
@@ -20,15 +71,53 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -37,13 +126,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -52,11 +151,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertEntry(TransactionId xid,
+								  FdwXactParticipant *fdw_part);
+static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -82,6 +273,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/*
@@ -142,14 +340,336 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants != NIL);
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactRslvState state;
+		FdwXact		fdwxact;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		state.server = fdw_part->server;
+		state.usermapping = fdw_part->usermapping;
+		state.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&state);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -162,6 +682,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 
 	state.server = fdw_part->server;
 	state.usermapping = fdw_part->usermapping;
+	state.fdwxact_id = NULL;
 	state.flags = FDWXACT_FLAG_ONEPHASE;
 
 	if (commit)
@@ -181,14 +702,46 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nlefts = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/* Unlock the foreign transaction entry */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+		nlefts++;
+	}
+
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nlefts > 0)
+	{
+		elog(DEBUG1, "left %u foreign transactions", nlefts);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -211,23 +764,1202 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
+
+		/*
+		 * This foreign transaction might have been prepared.  In commit case,
+		 * we don't need to do anything for this participant because all foreign
+		 * transactions should have already been prepared and therefore the
+		 * transaction already closed. These will be resolved manually.  On the
+		 * other hand in abort case, we need to close the transaction if
+		 * preparing might be in-progress, since an error might have occurred
+		 * on preparing a foreign transaction.
+		 */
+		if (!is_commit)
+		{
+			int					   status;
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * Check if the local transaction has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
  */
 void
 PrePrepare_FdwXact(void)
 {
-	/* We don't support to prepare foreign transactions */
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * We keep FdwXactParticipants until the transaction end so that we change
+	 * the involved foreign transactions to ABORTING in case of failure.
+	 */
+}
+
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+static void
+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		FdwXactResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	bool		found = false;
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		found = true;
+		break;
+	}
+
+	return found ? i : -1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ *
+ * XXX: we can exclude FdwXact entries whose status is already committing
+ * or aborting.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	elog(ERROR,
+		 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+FdwXactResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactRslvState state;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+
+	if (fdwxact->status != FDWXACT_STATUS_COMMITTING &&
+		fdwxact->status != FDWXACT_STATUS_ABORTING)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare the resolution state to pass to API */
+	state.server = server;
+	state.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	state.fdwxact_id = fdwxact->fdwxact_id;
+	state.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&state);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&state);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts == 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->fdwxact_id);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->fdwxact_id)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->fdwxact_id),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		FdwXactResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("permission denied to remove prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction"))));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..ca761763e5
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..e4ae79e599 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index fc18b77832..5c8a55358d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b8990af8b6..a87f6b5abf 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2568,6 +2568,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b18257c198..17773d38da 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4629,6 +4630,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6314,6 +6316,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6861,14 +6866,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7070,7 +7076,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7582,11 +7591,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7914,8 +7925,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9281,6 +9296,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9829,6 +9845,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9848,6 +9865,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9866,6 +9884,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -10073,6 +10092,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10276,6 +10296,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..c134c5a253 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index ec024fa106..492627caa1 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1409,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index d50dc099c6..cfddb5d854 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..c34d14bab8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4158,6 +4158,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7de27ee4e0..9e11bf3822 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d897f2c5fc 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -178,6 +178,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..6f14a950bf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index cf12eda504..ba6d6c7c2d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1709,6 +1715,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1836,6 +1843,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1893,6 +1906,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3804,6 +3820,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
+/*
+ * ProcArrayGetFdwXactUnresolvedXmin
+ *
+ * Return the current unresolved xmin limits.
+ */
+TransactionId
+ProcArrayGetFdwXactUnresolvedXmin(void)
+{
+	TransactionId xmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	xmin = procArray->fdwxact_unresolved_xmin;
+	LWLockRelease(ProcArrayLock);
+
+	return xmin;
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..dc29a7ea6f 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..9c78b2a90a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2470,6 +2471,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8930a94fff..68548b4633 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index c854221a30..db9fb14623 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -204,6 +204,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..dd70a0f8a2 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 6c8b111ab5..9ba819e9d1 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,24 +10,112 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+}			FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactRslvState
 {
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactRslvState;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..b4cec76eae
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index f582cf535f..5ab1f57212 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 91786da784..3d35f89ae0 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..0823baf1a1 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..5673ec7299 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d27336adcd..1830364fcc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6030,6 +6030,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 570e605e1a..eb86b09f7a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -171,6 +171,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
 
@@ -254,6 +255,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c38b689710..30d3a7eea0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1010,6 +1010,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index b01fa52139..1fd53bcd60 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -93,5 +93,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
+extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a687e99d1e..88734ee4e4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v32-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/octet-stream; name=v32-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From f6e875650d479fbee6c01e3fd2c25012a5773232 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v32 09/11] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 3c22060f27..a17c934006 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -58,6 +58,7 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -285,6 +286,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->have_error = false;
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -299,6 +301,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -570,7 +586,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -579,6 +595,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8162e0ace7..71f3d91695 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2380,6 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3565,6 +3566,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 788605cfc2..144fe5cd16 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v32-0008-Prepare-foreign-transactions-at-commit-time.patchapplication/octet-stream; name=v32-0008-Prepare-foreign-transactions-at-commit-time.patchDownload
From 7c9a8d6b41a524e211a414e147a3918e95a7b544 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v32 08/11] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c          | 191 +++++++++++++++++-
 src/backend/access/transam/xact.c             |   7 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  10 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 229 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index 7fc199cc55..adc81499e9 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -19,13 +19,27 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  The preapred
- * foreign transactions are resolved by a resolver process asynchronously.  Also, the
- * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
- * manually.
+ * local transaction but not do anything for involved foreign transactions.
  *
  * LOCKING
  *
@@ -92,8 +106,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -105,6 +121,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /* Directory where the foreign prepared transaction files will reside */
 #define FDWXACTS_DIR "pg_fdwxact"
 
@@ -142,6 +162,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -152,18 +175,24 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
@@ -182,6 +211,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -258,7 +288,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -273,6 +303,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -302,6 +333,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -348,6 +380,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -356,11 +389,139 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	return fdw_part;
 }
 
+ /*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+}
+
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_notwophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_notwophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performed
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_notwophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	ListCell   *lc;
 
@@ -378,6 +539,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
 		Assert(fdw_part->fdwxact_id);
@@ -755,7 +919,10 @@ ForgetAllFdwXactParticipants(void)
 	int			nlefts = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -812,7 +979,10 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (!fdwxact)
 		{
-			/* Commit or rollback the foreign transaction in one-phase */
+			/*
+			 * If this participant doesn't have an FdwXact entry, it's not
+			 * prepared yet. Therefore we can commit or rollback it in one-phase.
+			 */
 			Assert(ServerSupportTransactionCallback(fdw_part));
 			FdwXactParticipantEndTransaction(fdw_part, is_commit);
 			continue;
@@ -842,6 +1012,7 @@ AtEOXact_FdwXact(bool is_commit)
 	}
 
 	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -881,7 +1052,7 @@ PrePrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * We keep FdwXactParticipants until the transaction end so that we change
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a87f6b5abf..657778d926 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2123,6 +2127,9 @@ CommitTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
+
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index add8e598e8..f530cd20dd 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -501,6 +501,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4703,6 +4721,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 58ac54b8c8..6165c6d689 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -746,6 +746,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index a3763e52c0..6bf4f5dd7d 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -20,6 +20,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -107,10 +115,12 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
+extern void PreCommit_FdwXact(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
 extern bool FdwXactIsForeignTwophaseCommitRequired(void);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 7885827172..5fd51b408c 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -273,7 +273,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v32-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v32-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 5cba2022027a0bb70af1579b2bbe73b26804509a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v32 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 524 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1294 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 59921b46cf..45ddcdcb0a 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..52e4971aed
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $wait_until) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$wait_until = 0 unless defined $wait_until;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) FROM pg_foreign_xacts",
+							$wait_until);
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..8e2a57b052
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,524 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactRslvState *state);
+static void testCommitForeignTransaction(FdwXactRslvState *state);
+static void testRollbackForeignTransaction(FdwXactRslvState *state);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+
+	if (check_event(state->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 state->fdwxact_id,
+							 state->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactRslvState *state)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(state->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactRslvState *state)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (state->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, state->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 state->fdwxact_id,
+								 state->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 96442ceb4e..0e5e05e41a 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b284cc88c4..5ceba8972a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2350,9 +2350,12 @@ regression_main(int argc, char *argv[],
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2367,7 +2370,9 @@ regression_main(int argc, char *argv[],
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 5634b2d40c..3c48fbb2d9 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v32-0003-Recreate-RemoveForeignServerById.patchapplication/octet-stream; name=v32-0003-Recreate-RemoveForeignServerById.patchDownload
From 15d108059ddb3b9bce8b9ac21a323848e27cf7b0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v32 03/11] Recreate RemoveForeignServerById()

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary for follow up commit that checks if the
foreign server has prepared transaction or not when removing.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 2140151a6a..7c9899f14d 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1549,6 +1549,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1563,7 +1567,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index eb7103fd3b..ec024fa106 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index e2d2a77ca4..1d0b408163 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -129,6 +129,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.27.0

v32-0006-Add-GetPrepareId-API.patchapplication/octet-stream; name=v32-0006-Add-GetPrepareId-API.patchDownload
From 3a62ab9d32ca5eac8ba06961a94d5dce98397a55 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v32 06/11] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/fdwxact.c | 54 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index cbbd53dc7d..eb81fb338f 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -143,6 +143,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -347,6 +348,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -414,9 +416,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -431,13 +434,48 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index eb86b09f7a..7885827172 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -174,6 +174,8 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
 typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -256,6 +258,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 1cf4c02491e0536eaf38952c9e1c3d32d5455e5d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v32 02/11] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 470 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 237 insertions(+), 242 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 266f66cc62..e8aafca42d 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -16,6 +16,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -80,8 +81,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -94,6 +94,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -108,53 +110,11 @@ static bool UserMappingPasswordRequired(UserMapping *user);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
-	/* Set flag that we did GetConnection during the current transaction */
-	xact_got_connection = true;
-
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -186,7 +146,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -247,7 +207,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -256,6 +216,56 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Set flag that we did GetConnection during the current transaction */
+	xact_got_connection = true;
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -545,7 +555,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -557,6 +567,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -772,199 +785,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state or it is marked as
-		 * invalid, then discard it to recover. Next GetConnection will open a
-		 * new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state ||
-			entry->invalidated)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1341,3 +1161,171 @@ exit:	;
 		*result = last_res;
 	return timed_out;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	entry->changing_xact_state = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..3724fdab3d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8984,7 +8984,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..ad00a9ce2b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -560,6 +560,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a1bc..d714034d6b 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v32-0005-postgres_fdw-supports-prepare-API.patchapplication/octet-stream; name=v32-0005-postgres_fdw-supports-prepare-API.patchDownload
From 30b2c062599becb9266096d5a86595edff27b628 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v32 05/11] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 137 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index e8aafca42d..3c22060f27 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -96,6 +96,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1166,12 +1168,19 @@ void
 postgresCommitForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1217,16 +1226,24 @@ void
 postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((frstate->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, frstate->usermapping,
+								frstate->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1303,6 +1320,46 @@ postgresRollbackForeignTransaction(FdwXactRslvState *frstate)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactRslvState *frstate)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(frstate->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", frstate->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   frstate->server->servername, frstate->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 frstate->server->servername, frstate->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1329,3 +1386,75 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 707f1d7cd4..b7cae97600 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8974,19 +8974,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index ad00a9ce2b..8162e0ace7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -563,6 +563,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index d714034d6b..788605cfc2 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactRslvState *frstate);
 extern void postgresRollbackForeignTransaction(FdwXactRslvState *frstate);
+extern void postgresPrepareForeignTransaction(FdwXactRslvState *frstate);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..666f39210f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2647,13 +2647,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v32-0007-Introduce-foreign-transaction-launcher-and-resol.patchapplication/octet-stream; name=v32-0007-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From e461adae15aedcdda10cd563150705e2e0ae3f26 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v32 07/11] Introduce foreign transaction launcher and resolver
 processes.

This commits introduces to new background processes: foreign
transaction launcher and resolvers. With this change, users no longer
need to use pg_resolve_foreign_xact() to resolve foreign transaction
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK
TRANSACTION. These foreign transactions are resolved in background by
foreign transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/fdwxact/Makefile           |   5 +-
 src/backend/access/fdwxact/fdwxact.c          |  33 +-
 src/backend/access/fdwxact/launcher.c         | 567 ++++++++++++++++++
 src/backend/access/fdwxact/resolver.c         | 352 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   6 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  63 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1183 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/access/fdwxact/launcher.c
 create mode 100644 src/backend/access/fdwxact/resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
index aacab1d729..151e3ae336 100644
--- a/src/backend/access/fdwxact/Makefile
+++ b/src/backend/access/fdwxact/Makefile
@@ -12,6 +12,9 @@ subdir = src/backend/access/fdwxact
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fdwxact.o
+OBJS = \
+	fdwxact.o \
+	resolver.o \
+	launcher.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
index eb81fb338f..7fc199cc55 100644
--- a/src/backend/access/fdwxact/fdwxact.c
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -22,10 +22,10 @@
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  To resolve
- * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
- * function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * local transaction but not do anything for involved foreign transactions.  The preapred
+ * foreign transactions are resolved by a resolver process asynchronously.  Also, the
+ * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
+ * manually.
  *
  * LOCKING
  *
@@ -76,7 +76,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -157,6 +160,7 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
@@ -165,7 +169,6 @@ static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertEntry(TransactionId xid,
 								  FdwXactParticipant *fdw_part);
-static void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
 static void FdwXactResolveOneFdwXact(FdwXact fdwxact);
@@ -772,12 +775,13 @@ ForgetAllFdwXactParticipants(void)
 
 	/*
 	 * If we leave any FdwXact entries, update the oldest local transaction of
-	 * unresolved distributed transaction.
+	 * unresolved distributed transaction and notify the launcher.
 	 */
 	if (nlefts > 0)
 	{
 		elog(DEBUG1, "left %u foreign transactions", nlefts);
 		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
 	}
 
 	list_free_deep(FdwXactParticipants);
@@ -785,7 +789,9 @@ ForgetAllFdwXactParticipants(void)
 }
 
 /*
- * Commit or rollback all foreign transactions.
+ * Close in-progress involved foreign transactions.  We don't perform the second
+ * phase of two-phase commit protocol here.  All prepared foreign transactions
+ * enter in-doubt state and a resolver process will process them.
  */
 void
 AtEOXact_FdwXact(bool is_commit)
@@ -889,7 +895,7 @@ PrePrepare_FdwXact(void)
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
+void
 FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
 	for (int i = 0; i < nfdwxacts; i++)
@@ -924,6 +930,17 @@ FdwXactExists(Oid dbid, Oid serverid, Oid userid)
 
 	return (idx >= 0);
 }
+bool
+FdwXactExistsXid(TransactionId xid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(InvalidOid, xid, InvalidOid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
 
 /*
  * Return the index of first found FdwXact entry that matched to given arguments.
diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c
new file mode 100644
index 0000000000..916b9af2f7
--- /dev/null
+++ b/src/backend/access/fdwxact/launcher.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+static void fdwxact_launcher_onexit(int code, Datum arg);
+static void fdwxact_launcher_sighup(SIGNAL_ARGS);
+static void fdwxact_launch_resolver(Oid dbid);
+static bool fdwxact_relaunch_resolvers(void);
+
+static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactLauncherRequestToLaunch(void)
+{
+	if (FdwXactRslvCtl->launcher_pid != InvalidPid)
+		kill(FdwXactRslvCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactRslvCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactRslvShmemInit(void)
+{
+	bool		found;
+
+	FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers",
+									 FdwXactRslvShmemSize(),
+									 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize());
+		SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue));
+		FdwXactRslvCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+fdwxact_launcher_onexit(int code, Datum arg)
+{
+	FdwXactRslvCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGHUP: set flag to reload configuration at next convenient time */
+static void
+fdwxact_launcher_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+fdwxact_launcher_sigusr2(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0);
+
+	Assert(FdwXactRslvCtl->launcher_pid == InvalidPid);
+	FdwXactRslvCtl->launcher_pid = MyProcPid;
+	FdwXactRslvCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, fdwxact_launcher_sighup);
+	pqsignal(SIGUSR2, fdwxact_launcher_sigusr2);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = fdwxact_relaunch_resolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactLauncherRequestToLaunch();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+fdwxact_launch_resolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactRslvCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+fdwxact_relaunch_resolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			fdwxact_launch_resolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactRslvCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactRslvCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c
new file mode 100644
index 0000000000..c9d41428fc
--- /dev/null
+++ b/src/backend/access/fdwxact/resolver.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactRslvCtlData *FdwXactRslvCtl;
+
+static void FXRslvLoop(void);
+static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime);
+static void FXRslvCheckTimeout(TimestampTz now);
+
+static void fdwxact_resolver_sighup(SIGNAL_ARGS);
+static void fdwxact_resolver_onexit(int code, Datum arg);
+static void fdwxact_resolver_detach(void);
+static void fdwxact_resolver_attach(int slot);
+static void hold_indoubt_fdwxacts(void);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/* Set flag to reload configuration at next convenient time */
+static void
+fdwxact_resolver_sighup(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGHUP = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+fdwxact_resolver_detach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+fdwxact_resolver_onexit(int code, Datum arg)
+{
+	fdwxact_resolver_detach();
+
+	/* Release the held foreign transaction entries */
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+fdwxact_resolver_attach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	fdwxact_resolver_attach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, fdwxact_resolver_sighup);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FXRslvLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FXRslvLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		hold_indoubt_fdwxacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			FdwXactResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FXRslvCheckTimeout(now);
+
+		sleep_time = FXRslvComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FXRslvCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	fdwxact_resolver_detach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+hold_indoubt_fdwxacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 5c8a55358d..077eb0009f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2286,6 +2288,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2345,6 +2354,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..2c7f55f8d9 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c34d14bab8..4ac70d49e2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3830,6 +3830,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9e11bf3822..21b6b1b72a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #endif
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -911,6 +912,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -976,12 +980,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 6f14a950bf..29753b516d 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactRslvShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactRslvShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index dc29a7ea6f..9327394013 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 28055680aa..2805e99d5e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3097,6 +3099,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9c78b2a90a..add8e598e8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -763,6 +763,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2481,6 +2485,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 68548b4633..58ac54b8c8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -735,6 +735,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 9ba819e9d1..a3763e52c0 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -104,13 +104,19 @@ typedef struct FdwXactRslvState
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void PrePrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern bool FdwXactExistsXid(TransactionId xid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
 								Oid userid, void *content, int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..688b43b8d0
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactLauncherRequestToLaunch(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactRslvShmemSize(void);
+extern void FdwXactRslvShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..779848113c
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..c935471936
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLaunchLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactRslvCtlData struct for the whole database cluster */
+typedef struct FdwXactRslvCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactRslvCtlData;
+#define SizeOfFdwXactRslvCtlData \
+	(offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+extern FdwXactResolver *MyFdwXactResolver;
+extern FdwXactRslvCtlData *FdwXactRslvCtl;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1830364fcc..5fe0abebf9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6167,6 +6167,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 30d3a7eea0..d2a0a98489 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -883,6 +883,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index b9b5c1adda..94e593ac77 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v32-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v32-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From 26790410870ac628738aa8884371208d78f8b1a7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v32 01/11] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/Makefile          |   4 +-
 src/backend/access/fdwxact/Makefile  |  17 ++
 src/backend/access/fdwxact/fdwxact.c | 233 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  10 ++
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  33 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 7 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/fdwxact/Makefile
 create mode 100644 src/backend/access/fdwxact/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 0880e0a8bb..2372a1a690 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,7 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  table tablesample transam
+SUBDIRS	    = brin common fdwxact gin gist hash heap index nbtree rmgrdesc \
+			  spgist table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile
new file mode 100644
index 0000000000..aacab1d729
--- /dev/null
+++ b/src/backend/access/fdwxact/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/fdwxact
+#
+# IDENTIFICATION
+#    src/backend/access/fdwxact/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = fdwxact.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c
new file mode 100644
index 0000000000..00da860b31
--- /dev/null
+++ b/src/backend/access/fdwxact/fdwxact.c
@@ -0,0 +1,233 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * FDW who implements both commit and rollback APIs can request to register the
+ * foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/fdwxact/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscaches. Therefore, this is allocated in the TopTransactionContext.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactRslvState state;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	state.server = fdw_part->server;
+	state.usermapping = fdw_part->usermapping;
+	state.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&state);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Check if the local transaction has any foreign transaction.
+ */
+void
+PrePrepare_FdwXact(void)
+{
+	/* We don't support to prepare foreign transactions */
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..b8990af8b6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2230,6 +2231,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
 					  : XACT_EVENT_COMMIT);
 
+	/* Commit foreign transaction if any */
+	AtEOXact_FdwXact(true);
+
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 						 RESOURCE_RELEASE_BEFORE_LOCKS,
 						 true, true);
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Prepare foreign trasactions */
+	PrePrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2756,6 +2763,9 @@ AbortTransaction(void)
 		else
 			CallXactCallbacks(XACT_EVENT_ABORT);
 
+		/* Rollback foreign transactions if any */
+		AtEOXact_FdwXact(false);
+
 		ResourceOwnerRelease(TopTransactionResourceOwner,
 							 RESOURCE_RELEASE_BEFORE_LOCKS,
 							 false, true);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 5564dc3a1e..d50dc099c6 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support either both APIs or neither */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..6c8b111ab5
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,33 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactRslvState
+{
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactRslvState;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void PrePrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499fb1..570e605e1a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -170,6 +171,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
+typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -246,6 +250,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -259,4 +267,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#209Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#208)
Re: Transactions involving multiple postgres foreign servers, take 2

Hi,
For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :

+ bool have_notwophase = false;

Maybe name the variable have_no_twophase so that it is easier to read.

+ * Two-phase commit is not required if the number of servers performed

performed -> performing

+                errmsg("cannot process a distributed transaction that has
operated on a foreign server that does not support two-phase commit
protocol"),
+                errdetail("foreign_twophase_commit is \'required\' but the
transaction has some foreign servers which are not capable of two-phase
commit")));

The lines are really long. Please wrap into more lines.

On Wed, Jan 13, 2021 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Show quoted text

On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,

Thank you for reviewing the patch!

For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :

However these functions are not neither committed nor aborted at

I think the double negation was not intentional. Should be 'are neither

...'

Fixed.

For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the

return statement ?

Hmm, you mean that we need MAXALIGN(size) after adding the size of
FdwXactData structs?

Size
FdwXactShmemSize(void)
{
Size size;

/* Size for foreign transaction information array */
size = offsetof(FdwXactCtlData, fdwxacts);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXact)));
size = MAXALIGN(size);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXactData)));

return size;
}

I don't think we need to do that. Looking at other similar code such
as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?

+ fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);

For the function name, Fdw and Xact appear twice, each. Maybe one of

them can be dropped ?

Agreed. Changed to FdwXactInsertEntry().

+ * we don't need to anything for this participant because all

foreign

'need to' -> 'need to do'

Fixed.

+   else if (TransactionIdDidAbort(xid))
+       return FDWXACT_STATUS_ABORTING;
+
the 'else' can be omitted since the preceding if would return.

Fixed.

+ if (max_prepared_foreign_xacts <= 0)

I wonder when the value for max_prepared_foreign_xacts would be negative

(and whether that should be considered an error).

Fixed to (max_prepared_foreign_xacts == 0)

Attached the updated version patch set.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#210Zhihong Yu
zyu@yugabyte.com
In reply to: Zhihong Yu (#209)
Re: Transactions involving multiple postgres foreign servers, take 2

For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :

+   entry->changing_xact_state = true;
...
+   entry->changing_xact_state = abort_cleanup_failure;

I don't see return statement in between the two assignments. I wonder
why entry->changing_xact_state is set to true, and later being assigned
again.

For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :

bq. This commits introduces to new background processes: foreign

commits introduces to new -> commit introduces two new

+FdwXactExistsXid(TransactionId xid)

Since Xid is the parameter to this method, I think the Xid suffix can be
dropped from the method name.

+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Please correct year in the next patch set.

+FdwXactLauncherRequestToLaunch(void)

Since the launcher's job is to 'launch', I think the Launcher can be
omitted from the method name.

+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)

Are both Rsover and Rslv referring to resolver ? It would be better to use
whole word which reduces confusion.
Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or
FdwXactResolveShmemInit)

+fdwxact_launch_resolver(Oid dbid)

The above method is not in camel case. It would be better if method names
are consistent (in casing).

+                errmsg("out of foreign transaction resolver slots"),
+                errhint("You might need to increase
max_foreign_transaction_resolvers.")));

It would be nice to include the value of max_foreign_xact_resolvers

For fdwxact_resolver_onexit():

+       LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+       fdwxact->locking_backend = InvalidBackendId;
+       LWLockRelease(FdwXactLock);

There is no call to method inside the for loop which may take time. I
wonder if the lock can be obtained prior to the for loop and released
coming out of the for loop.

+FXRslvLoop(void)

Please use Resolver instead of Rslv

+ FdwXactResolveFdwXacts(held_fdwxacts, nheld);

Fdw and Xact are repeated twice each in the method name. Probably the
method name can be made shorter.

Cheers

On Thu, Jan 14, 2021 at 11:04 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Show quoted text

Hi,
For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :

+ bool have_notwophase = false;

Maybe name the variable have_no_twophase so that it is easier to read.

+ * Two-phase commit is not required if the number of servers performed

performed -> performing

+                errmsg("cannot process a distributed transaction that has
operated on a foreign server that does not support two-phase commit
protocol"),
+                errdetail("foreign_twophase_commit is \'required\' but
the transaction has some foreign servers which are not capable of two-phase
commit")));

The lines are really long. Please wrap into more lines.

On Wed, Jan 13, 2021 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,

Thank you for reviewing the patch!

For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :

However these functions are not neither committed nor aborted at

I think the double negation was not intentional. Should be 'are neither

...'

Fixed.

For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the

return statement ?

Hmm, you mean that we need MAXALIGN(size) after adding the size of
FdwXactData structs?

Size
FdwXactShmemSize(void)
{
Size size;

/* Size for foreign transaction information array */
size = offsetof(FdwXactCtlData, fdwxacts);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXact)));
size = MAXALIGN(size);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXactData)));

return size;
}

I don't think we need to do that. Looking at other similar code such
as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?

+ fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);

For the function name, Fdw and Xact appear twice, each. Maybe one of

them can be dropped ?

Agreed. Changed to FdwXactInsertEntry().

+ * we don't need to anything for this participant because all

foreign

'need to' -> 'need to do'

Fixed.

+   else if (TransactionIdDidAbort(xid))
+       return FDWXACT_STATUS_ABORTING;
+
the 'else' can be omitted since the preceding if would return.

Fixed.

+ if (max_prepared_foreign_xacts <= 0)

I wonder when the value for max_prepared_foreign_xacts would be

negative (and whether that should be considered an error).

Fixed to (max_prepared_foreign_xacts == 0)

Attached the updated version patch set.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#211Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhihong Yu (#209)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jan 15, 2021 at 4:03 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

+ bool have_notwophase = false;

Maybe name the variable have_no_twophase so that it is easier to read.

Fixed.

+ * Two-phase commit is not required if the number of servers performed

performed -> performing

Fixed.

+                errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+                errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));

The lines are really long. Please wrap into more lines.

Hmm, we can do that but if we do that, it makes grepping by the error
message hard. Please refer to the documentation about the formatting
guideline[1]https://www.postgresql.org/docs/devel/source-format.html:

Limit line lengths so that the code is readable in an 80-column
window. (This doesn't mean that you must never go past 80 columns. For
instance, breaking a long error message string in arbitrary places
just to keep the code within 80 columns is probably not a net gain in
readability.)

These changes have been made in the local branch. I'll post the
updated patch set after incorporating all the comments.

Regards,

[1]: https://www.postgresql.org/docs/devel/source-format.html

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#212Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#211)
Re: Transactions involving multiple postgres foreign servers, take 2

Hi,
For v32-0004-Add-PrepareForeignTransaction-API.patch :

+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.     To avoid holding the lock during transaction
processing
+ * which may take an unpredicatable time the in-memory data of foreign

entry is update -> entry is updated

unpredictable -> unpredictable

+ int nlefts = 0;

nlefts -> nremaining

+ elog(DEBUG1, "left %u foreign transactions", nlefts);

The message can be phrased as "%u foreign transactions remaining"

+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)

Fdw and Xact are repeated. Seems one should suffice. How about naming the
method FdwXactResolveTransactions() ?
Similar comment for FdwXactResolveOneFdwXact(FdwXact fdwxact)

For get_fdwxact():

+       /* This entry matches the condition */
+       found = true;
+       break;

Instead of breaking and returning, you can return within the loop directly.

Cheers

On Thu, Jan 14, 2021 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Show quoted text

On Fri, Jan 15, 2021 at 4:03 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

+ bool have_notwophase = false;

Maybe name the variable have_no_twophase so that it is easier to read.

Fixed.

+ * Two-phase commit is not required if the number of servers

performed

performed -> performing

Fixed.

+ errmsg("cannot process a distributed transaction that

has operated on a foreign server that does not support two-phase commit
protocol"),

+ errdetail("foreign_twophase_commit is \'required\' but

the transaction has some foreign servers which are not capable of two-phase
commit")));

The lines are really long. Please wrap into more lines.

Hmm, we can do that but if we do that, it makes grepping by the error
message hard. Please refer to the documentation about the formatting
guideline[1]:

Limit line lengths so that the code is readable in an 80-column
window. (This doesn't mean that you must never go past 80 columns. For
instance, breaking a long error message string in arbitrary places
just to keep the code within 80 columns is probably not a net gain in
readability.)

These changes have been made in the local branch. I'll post the
updated patch set after incorporating all the comments.

Regards,

[1] https://www.postgresql.org/docs/devel/source-format.html

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#213Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhihong Yu (#210)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu <zyu@yugabyte.com> wrote:

For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :

+   entry->changing_xact_state = true;
...
+   entry->changing_xact_state = abort_cleanup_failure;

I don't see return statement in between the two assignments. I wonder why entry->changing_xact_state is set to true, and later being assigned again.

Because postgresRollbackForeignTransaction() can get called again in
case where an error occurred during aborting and cleanup the
transaction. For example, if an error occurred when executing ABORT
TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
postgresRollbackForeignTransaction() will get called again while
entry->changing_xact_state is still true. Then the entry will be
caught by the following condition and cleaned up:

/*
* If connection is before starting transaction or is already unsalvageable,
* do only the cleanup and don't touch it further.
*/
if (entry->changing_xact_state)
{
pgfdw_cleanup_after_transaction(entry);
return;
}

For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :

bq. This commits introduces to new background processes: foreign

commits introduces to new -> commit introduces two new

Fixed.

+FdwXactExistsXid(TransactionId xid)

Since Xid is the parameter to this method, I think the Xid suffix can be dropped from the method name.

But there is already a function named FdwXactExists()?

bool
FdwXactExists(Oid dbid, Oid serverid, Oid userid)

As far as I read other code, we already have such functions that have
the same functionality but have different arguments. For instance,
SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
we can leave as it is but is it better to have like
FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?

+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Please correct year in the next patch set.

Fixed.

+FdwXactLauncherRequestToLaunch(void)

Since the launcher's job is to 'launch', I think the Launcher can be omitted from the method name.

Agreed. How about FdwXactRequestToLaunchResolver()?

+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)

Are both Rsover and Rslv referring to resolver ? It would be better to use whole word which reduces confusion.
Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or FdwXactResolveShmemInit)

Agreed. I realized that these functions are the launcher's function,
not resolver's. So I'd change to FdwXactLauncherShmemSize() and
FdwXactLauncherShmemInit() respectively.

+fdwxact_launch_resolver(Oid dbid)

The above method is not in camel case. It would be better if method names are consistent (in casing).

Fixed.

+                errmsg("out of foreign transaction resolver slots"),
+                errhint("You might need to increase max_foreign_transaction_resolvers.")));

It would be nice to include the value of max_foreign_xact_resolvers

I agree it would be nice but looking at other code we don't include
the value in this kind of messages.

For fdwxact_resolver_onexit():

+       LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+       fdwxact->locking_backend = InvalidBackendId;
+       LWLockRelease(FdwXactLock);

There is no call to method inside the for loop which may take time. I wonder if the lock can be obtained prior to the for loop and released coming out of the for loop.

Agreed.

+FXRslvLoop(void)

Please use Resolver instead of Rslv

Fixed.

+ FdwXactResolveFdwXacts(held_fdwxacts, nheld);

Fdw and Xact are repeated twice each in the method name. Probably the method name can be made shorter.

Fixed.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#214Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#213)
Re: Transactions involving multiple postgres foreign servers, take 2

Hi, Masahiko-san:

bq. How about FdwXactRequestToLaunchResolver()?

Sounds good to me.

bq. But there is already a function named FdwXactExists()

Then we can leave the function name as it is.

Cheers

On Sun, Jan 17, 2021 at 9:55 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Show quoted text

On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu <zyu@yugabyte.com> wrote:

For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :

+   entry->changing_xact_state = true;
...
+   entry->changing_xact_state = abort_cleanup_failure;

I don't see return statement in between the two assignments. I wonder

why entry->changing_xact_state is set to true, and later being assigned
again.

Because postgresRollbackForeignTransaction() can get called again in
case where an error occurred during aborting and cleanup the
transaction. For example, if an error occurred when executing ABORT
TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
postgresRollbackForeignTransaction() will get called again while
entry->changing_xact_state is still true. Then the entry will be
caught by the following condition and cleaned up:

/*
* If connection is before starting transaction or is already
unsalvageable,
* do only the cleanup and don't touch it further.
*/
if (entry->changing_xact_state)
{
pgfdw_cleanup_after_transaction(entry);
return;
}

For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :

bq. This commits introduces to new background processes: foreign

commits introduces to new -> commit introduces two new

Fixed.

+FdwXactExistsXid(TransactionId xid)

Since Xid is the parameter to this method, I think the Xid suffix can be

dropped from the method name.

But there is already a function named FdwXactExists()?

bool
FdwXactExists(Oid dbid, Oid serverid, Oid userid)

As far as I read other code, we already have such functions that have
the same functionality but have different arguments. For instance,
SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
we can leave as it is but is it better to have like
FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?

+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Please correct year in the next patch set.

Fixed.

+FdwXactLauncherRequestToLaunch(void)

Since the launcher's job is to 'launch', I think the Launcher can be

omitted from the method name.

Agreed. How about FdwXactRequestToLaunchResolver()?

+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)

Are both Rsover and Rslv referring to resolver ? It would be better to

use whole word which reduces confusion.

Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or

FdwXactResolveShmemInit)

Agreed. I realized that these functions are the launcher's function,
not resolver's. So I'd change to FdwXactLauncherShmemSize() and
FdwXactLauncherShmemInit() respectively.

+fdwxact_launch_resolver(Oid dbid)

The above method is not in camel case. It would be better if method

names are consistent (in casing).

Fixed.

+                errmsg("out of foreign transaction resolver slots"),
+                errhint("You might need to increase

max_foreign_transaction_resolvers.")));

It would be nice to include the value of max_foreign_xact_resolvers

I agree it would be nice but looking at other code we don't include
the value in this kind of messages.

For fdwxact_resolver_onexit():

+       LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+       fdwxact->locking_backend = InvalidBackendId;
+       LWLockRelease(FdwXactLock);

There is no call to method inside the for loop which may take time. I

wonder if the lock can be obtained prior to the for loop and released
coming out of the for loop.

Agreed.

+FXRslvLoop(void)

Please use Resolver instead of Rslv

Fixed.

+ FdwXactResolveFdwXacts(held_fdwxacts, nheld);

Fdw and Xact are repeated twice each in the method name. Probably the

method name can be made shorter.

Fixed.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#215Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#213)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/01/18 14:54, Masahiko Sawada wrote:

On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu <zyu@yugabyte.com> wrote:

For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :

+   entry->changing_xact_state = true;
...
+   entry->changing_xact_state = abort_cleanup_failure;

I don't see return statement in between the two assignments. I wonder why entry->changing_xact_state is set to true, and later being assigned again.

Because postgresRollbackForeignTransaction() can get called again in
case where an error occurred during aborting and cleanup the
transaction. For example, if an error occurred when executing ABORT
TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
postgresRollbackForeignTransaction() will get called again while
entry->changing_xact_state is still true. Then the entry will be
caught by the following condition and cleaned up:

/*
* If connection is before starting transaction or is already unsalvageable,
* do only the cleanup and don't touch it further.
*/
if (entry->changing_xact_state)
{
pgfdw_cleanup_after_transaction(entry);
return;
}

For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :

bq. This commits introduces to new background processes: foreign

commits introduces to new -> commit introduces two new

Fixed.

+FdwXactExistsXid(TransactionId xid)

Since Xid is the parameter to this method, I think the Xid suffix can be dropped from the method name.

But there is already a function named FdwXactExists()?

bool
FdwXactExists(Oid dbid, Oid serverid, Oid userid)

As far as I read other code, we already have such functions that have
the same functionality but have different arguments. For instance,
SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
we can leave as it is but is it better to have like
FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?

+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Please correct year in the next patch set.

Fixed.

+FdwXactLauncherRequestToLaunch(void)

Since the launcher's job is to 'launch', I think the Launcher can be omitted from the method name.

Agreed. How about FdwXactRequestToLaunchResolver()?

+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)

Are both Rsover and Rslv referring to resolver ? It would be better to use whole word which reduces confusion.
Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or FdwXactResolveShmemInit)

Agreed. I realized that these functions are the launcher's function,
not resolver's. So I'd change to FdwXactLauncherShmemSize() and
FdwXactLauncherShmemInit() respectively.

+fdwxact_launch_resolver(Oid dbid)

The above method is not in camel case. It would be better if method names are consistent (in casing).

Fixed.

+                errmsg("out of foreign transaction resolver slots"),
+                errhint("You might need to increase max_foreign_transaction_resolvers.")));

It would be nice to include the value of max_foreign_xact_resolvers

I agree it would be nice but looking at other code we don't include
the value in this kind of messages.

For fdwxact_resolver_onexit():

+       LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+       fdwxact->locking_backend = InvalidBackendId;
+       LWLockRelease(FdwXactLock);

There is no call to method inside the for loop which may take time. I wonder if the lock can be obtained prior to the for loop and released coming out of the for loop.

Agreed.

+FXRslvLoop(void)

Please use Resolver instead of Rslv

Fixed.

+ FdwXactResolveFdwXacts(held_fdwxacts, nheld);

Fdw and Xact are repeated twice each in the method name. Probably the method name can be made shorter.

Fixed.

You fixed some issues. But maybe you forgot to attach the latest patches?

I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature. If there are such changes, IMO we can apply them in advance, and which would make the patches simpler.

+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not commit transaction on server %s",
+							   frstate->server->servername)));

You changed the code this way because you want to include the server name in the error message? I agree that it's helpful to report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error() for the returned PGresult, the reported error message contains less information. If this understanding is right, I don't think that this change is an improvement.

Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that it also reports the server name? If we do that, the server name is reported not only when COMMIT fails but also when other commands fail.

Of course, if this change is not essential, we can skip doing this in the first version.

- /*
- * Regardless of the event type, we can now mark ourselves as out of the
- * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
- * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
- */
- xact_got_connection = false;

With this change, xact_got_connection seems to never be set to false. Doesn't this break pgfdw_subxact_callback() using xact_got_connection?

+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;

Originally this variable is reset to 0 once per transaction end. But with the patch, it's reset to 0 every time when a foreign transaction ends at each connection. This change would be harmless fortunately in practice, but seems not right theoretically.

This makes me wonder if new FDW API is not good at handling the case where some operations need to be performed once per transaction end.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#216Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#215)
11 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

You fixed some issues. But maybe you forgot to attach the latest patches?

Yes, I've attached the updated patches.

I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature. If there are such changes, IMO we can apply them in advance, and which would make the patches simpler.

Thank you for reviewing the patches!

+       if (PQresultStatus(res) != PGRES_COMMAND_OK)
+               ereport(ERROR, (errmsg("could not commit transaction on server %s",
+                                                          frstate->server->servername)));

You changed the code this way because you want to include the server name in the error message? I agree that it's helpful to report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error() for the returned PGresult, the reported error message contains less information. If this understanding is right, I don't think that this change is an improvement.

Right. It's better to use do_sql_command() instead.

Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that it also reports the server name? If we do that, the server name is reported not only when COMMIT fails but also when other commands fail.

Of course, if this change is not essential, we can skip doing this in the first version.

Yes, I think it's not essential for now. We can improve it later if we want.

- /*
- * Regardless of the event type, we can now mark ourselves as out of the
- * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
- * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
- */
- xact_got_connection = false;

With this change, xact_got_connection seems to never be set to false. Doesn't this break pgfdw_subxact_callback() using xact_got_connection?

I think xact_got_connection is set to false in
pgfdw_cleanup_after_transaction() that is called at the end of each
foreign transaction (i.g., in postgresCommitForeignTransaction() and
postgresRollbackForeignTransaction()).

But as you're concerned below, it's reset for each foreign transaction
end rather than the parent's transaction end.

+       /* Also reset cursor numbering for next transaction */
+       cursor_number = 0;

Originally this variable is reset to 0 once per transaction end. But with the patch, it's reset to 0 every time when a foreign transaction ends at each connection. This change would be harmless fortunately in practice, but seems not right theoretically.

This makes me wonder if new FDW API is not good at handling the case where some operations need to be performed once per transaction end.

I think that the problem comes from the fact that FDW needs to use
both SubXactCallback and new FDW API.

If we want to perform some operations at the end of the top
transaction per FDW, not per foreign transaction, we will either still
need to use XactCallback or need to rethink the FDW API design. But
given that we call commit and rollback FDW API for only foreign
servers that actually started a transaction, I’m not sure if there are
such operations in practice. IIUC there is not at least from the
normal (not-sub) transaction termination perspective.

IIUC xact_got_transaction is used to skip iterating over all cached
connections to find open remote (sub) transactions. This is not
necessary anymore at least from the normal transaction termination
perspective. So maybe we can improve it so that it tracks whether any
of the cached connections opened a subtransaction. That is, we set it
true when we created a savepoint on any connections and set it false
at the end of pgfdw_subxact_callback() if we see that xact_depth of
all cached entry is less than or equal to 1 after iterating over all
entries.

Regarding cursor_number, it essentially needs to be unique at least
within a transaction so we can manage it per transaction or per
connection. But the current postgres_fdw rather ensure uniqueness
across all connections. So it seems to me that this can be fixed by
making individual connection have cursor_number and resetting it in
pgfdw_cleanup_after_transaction(). I think this can be in a separate
patch. Or it also could solve this problem that we terminate
subtransactions via a FDW API but I don't think it's a good idea.

What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v34-0011-Add-regression-tests-for-foreign-twophase-commit.patchapplication/x-patch; name=v34-0011-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 37716d19a083542fb5051709efd8b7d87b62a58e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v34 11/11] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 +++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 524 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/021_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1294 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/021_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 59921b46cf..45ddcdcb0a 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..52e4971aed
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $wait_until) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$wait_until = 0 unless defined $wait_until;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) FROM pg_foreign_xacts",
+							$wait_until);
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback $xid on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..19ae113c20
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,524 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactInfo *finfo);
+static void testCommitForeignTransaction(FdwXactInfo *finfo);
+static void testRollbackForeignTransaction(FdwXactInfo *finfo);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	FdwXactRegisterXact(table->serverid, userid, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+
+	if (check_event(finfo->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 finfo->fdwxact_id,
+							 finfo->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(finfo->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 finfo->fdwxact_id,
+								 finfo->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 finfo->fdwxact_id,
+								 finfo->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 96442ceb4e..0e5e05e41a 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/021_fdwxact.pl b/src/test/recovery/t/021_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/021_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b284cc88c4..5ceba8972a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2350,9 +2350,12 @@ regression_main(int argc, char *argv[],
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2367,7 +2370,9 @@ regression_main(int argc, char *argv[],
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 7213e65e08..e624ed1998 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v34-0005-postgres_fdw-supports-prepare-API.patchapplication/x-patch; name=v34-0005-postgres_fdw-supports-prepare-API.patchDownload
From 1ffa293ec5ec49f0489b604835aa6fe11e9e2e2e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v34 05/11] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 137 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 72ac74ca21..b7b9e789d0 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -105,6 +105,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 static bool disconnect_cached_connections(Oid serverid);
 
@@ -1424,12 +1426,19 @@ void
 postgresCommitForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping,
+								finfo->fdwxact_id, true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1471,16 +1480,24 @@ void
 postgresRollbackForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping,
+								finfo->fdwxact_id, false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1557,6 +1574,46 @@ postgresRollbackForeignTransaction(FdwXactInfo *finfo)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", finfo->fdwxact_id);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   finfo->server->servername, finfo->fdwxact_id)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 finfo->server->servername, finfo->fdwxact_id);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1587,3 +1644,75 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback",
+		 fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 73a0868347..b1e7769415 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9002,19 +9002,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 66a47f9f31..dd55bde3bb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -586,6 +586,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c44d37f280..8c72c910c7 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
 extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
+extern void postgresPrepareForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 647192cf6a..9c069b9acc 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2669,13 +2669,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v34-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/x-patch; name=v34-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From f7be8aed5dbd96d9e9b14967fabe9d275732b5ed Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v34 09/11] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index b7b9e789d0..967a2fca53 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -61,6 +61,7 @@ typedef struct ConnCacheEntry
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
 	Oid			serverid;		/* foreign server OID used to get server name */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -297,6 +298,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
 	entry->serverid = server->serverid;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -311,6 +313,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user->serverid, user->userid, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -582,7 +598,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user->serverid, user->userid);
+		FdwXactRegisterXact(user->serverid, user->userid, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -591,6 +607,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index dd55bde3bb..27263bbf8e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2487,6 +2487,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3680,6 +3681,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8c72c910c7..fc5a0766f4 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v34-0008-Prepare-foreign-transactions-at-commit-time.patchapplication/x-patch; name=v34-0008-Prepare-foreign-transactions-at-commit-time.patchDownload
From 559b0e6c6bda1693862d83fbb5e9ff04ad9391f7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v34 08/11] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c          | 191 +++++++++++++++++-
 src/backend/access/transam/xact.c             |  15 +-
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  10 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 235 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index def4988237..02ae44f1f1 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -19,13 +19,27 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  The preapred
- * foreign transactions are resolved by a resolver process asynchronously.  Also, the
- * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
- * manually.
+ * local transaction but not do anything for involved foreign transactions.
  *
  * LOCKING
  *
@@ -92,8 +106,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -105,6 +121,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /* Directory where the foreign prepared transaction files will reside */
 #define FDWXACTS_DIR "pg_fdwxact"
 
@@ -145,6 +165,9 @@ typedef struct FdwXactParticipant
 	/* Transaction identifier used for PREPARE */
 	char	   *fdwxact_id;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -155,18 +178,24 @@ typedef struct FdwXactParticipant
 /*
  * List of foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static List *FdwXactParticipants = NIL;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
@@ -185,6 +214,7 @@ static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
 static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
 static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
 							  Oid umid, char *fdwxact_id);
 static void remove_fdwxact(FdwXact fdwxact);
@@ -261,7 +291,7 @@ FdwXactShmemInit(void)
  * as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(Oid serverid, Oid userid)
+FdwXactRegisterXact(Oid serverid, Oid userid, bool modified)
 {
 	FdwXactParticipant *fdw_part;
 	MemoryContext old_ctx;
@@ -276,6 +306,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 			fdw_part->usermapping->userid == userid)
 		{
 			/* Already registered */
+			fdw_part->modified |= modified;
 			return;
 		}
 	}
@@ -305,6 +336,7 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
 
 	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+	fdw_part->modified = modified;
 
 	/* Add to the participants list */
 	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
@@ -351,6 +383,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
 	fdw_part->fdwxact_id = NULL;
+	fdw_part->modified = false;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -359,11 +392,139 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	return fdw_part;
 }
 
+ /*
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
+ */
+void
+PreCommit_FdwXact(void)
+{
+	TransactionId xid;
+	bool		local_modified;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+	}
+}
+
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	ListCell   *lc;
+	bool		have_no_twophase = false;
+	int			nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_no_twophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performing
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_no_twophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	ListCell   *lc;
 
@@ -381,6 +542,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
 		Assert(fdw_part->fdwxact_id);
@@ -757,7 +921,10 @@ ForgetAllFdwXactParticipants(void)
 	int			nremaining = 0;
 
 	if (FdwXactParticipants == NIL)
+	{
+		Assert(!ForeignTwophaseCommitIsRequired);
 		return;
+	}
 
 	foreach(cell, FdwXactParticipants)
 	{
@@ -814,7 +981,10 @@ AtEOXact_FdwXact(bool is_commit)
 
 		if (!fdwxact)
 		{
-			/* Commit or rollback the foreign transaction in one-phase */
+			/*
+			 * If this participant doesn't have an FdwXact entry, it's not
+			 * prepared yet. Therefore we can commit or rollback it in one-phase.
+			 */
 			Assert(ServerSupportTransactionCallback(fdw_part));
 			FdwXactParticipantEndTransaction(fdw_part, is_commit);
 			continue;
@@ -844,6 +1014,7 @@ AtEOXact_FdwXact(bool is_commit)
 	}
 
 	ForgetAllFdwXactParticipants();
+	ForeignTwophaseCommitIsRequired = false;
 }
 
 /*
@@ -883,7 +1054,7 @@ AtPrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * We keep FdwXactParticipants until the transaction end so that we change
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 3db7fa94c1..b1d6d6623f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
@@ -2126,8 +2130,8 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
-	/* Commit foreign transactions if any */
-	AtEOXact_FdwXact(true);
+	/* Pre-commit step for foreign transactions */
+	PreCommit_FdwXact();
 
 	/* If we might have parallel workers, clean them up now. */
 	if (IsInParallelMode())
@@ -2208,6 +2212,13 @@ CommitTransaction(void)
 
 	TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid);
 
+	/*
+	 * Commit foreign transactions if any.  This needs to be done before marking
+	 * this transaction as not running since FDW's transaction callbacks might
+	 * assume this transaction is still in progress.
+	 */
+	AtEOXact_FdwXact(true);
+
 	/*
 	 * Let others know about no transaction in progress by me. Note that this
 	 * must be done _before_ releasing locks we hold and _after_
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index add8e598e8..f530cd20dd 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -501,6 +501,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4703,6 +4721,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 58ac54b8c8..6165c6d689 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -746,6 +746,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 7656ddad02..1bb1dd878c 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -20,6 +20,14 @@
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -107,10 +115,12 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
+extern void PreCommit_FdwXact(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void AtPrepare_FdwXact(void);
 extern bool FdwXactIsForeignTwophaseCommitRequired(void);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 64bdbff7ce..043f3f46cc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -284,7 +284,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactRegisterXact(Oid serverid, Oid userid, bool modified);
 extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v34-0010-Documentation-update.patchapplication/x-patch; name=v34-0010-Documentation-update.patchDownload
From 8ce45564b0b44e847915b8550f0b8ad9f5f233fb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v34 10/11] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 src/backend/access/transam/README.fdwxact | 134 ++++++++++++
 10 files changed, 1022 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/transam/README.fdwxact

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..aaeebdd34a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9299,6 +9299,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11152,6 +11157,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..032801658c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9336,6 +9336,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 854913ae5f..3c056193f0 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1504,6 +1504,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1983,4 +2094,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactInfo</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a5161bb22b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index aa99665e2e..02a7bfa159 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26906,6 +26906,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9496f76b1f..d0dd3b1341 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1072,6 +1072,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1301,6 +1313,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1594,6 +1618,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1907,6 +1936,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
diff --git a/src/backend/access/transam/README.fdwxact b/src/backend/access/transam/README.fdwxact
new file mode 100644
index 0000000000..8da9030689
--- /dev/null
+++ b/src/backend/access/transam/README.fdwxact
@@ -0,0 +1,134 @@
+src/backend/access/transam/README.fdwxact
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactParticipant, which is maintained by PostgreSQL's the global
+transaction manager (GTM), as a distributed transaction participant The
+registered foreign transactions are tracked until the end of transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions are resolved by
+the resolver process asynchronusly or can be resolved using by
+pg_resolve_foreign_xact() manually, and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process.
+
+
+Identifying Foreign Transactions In GTM
+---------------------------------------
+
+To identify foreign transaction participants (as well as FdwXact entries) there
+are two ways: using {server OID, user OID} and using user mapping OID. The same
+is true for FDWs to identify the connections (and transactions upon) to the
+foreign server. We need to consider the case where the way to identify the
+transactions is not matched between GTM and FDWs, because the problem might occur
+when the user modifies the same foreign server by different roles within the
+transaction. For example, consider the following execution:
+
+BEGIN;
+SET ROLE user_A;
+INSERT INTO ft1 VALUES (1);
+SET ROLE user_B;
+INSERT INTO ft1 VALUES (1);
+COMMIT;
+
+For example, suppose that an FDW identifies the connection by {server OID, user OID}
+and GTM identifies the transactions by user mapping OID, and user_A and user_B use
+the public user mapping to connect server_X. In the FDW, there are two
+connections: {user_A, sever_X} and {user_B, server_X}, and therefore opens two
+transactions on each connection, while GTM has only one FdwXact entry because the two
+connections refer to the same user mapping OID. As a result, at the end of the
+transaction, GTM ends only one foreign transaction, leaving another one.
+
+On the other hand, suppose that an FDW identifies the connection by user mapping OID
+and GTM does that by {server OID, user OID}, the FDW uses only one connection and opens
+a transaction since both users refer to the same user mapping OID (we expect FDWs
+not to register the foreign transaction when not starting a new transaction on the
+foreign server). Since GTM also has one entry it can end the foreign transaciton
+properly. The downside would be that the user OID of FdwXact (i.g., FdwXact->userid)
+is the user who registered the foreign transaction for the first time, necessarily
+not the user who executed COMMIT.  For example in the above case, FdwXact->userid
+will be user_A, not user_B. But it’s not big problem in practice.
+
+Therefore, in fdwxact.c, we identify the foreign transaction by
+{server OID, user OID}.
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transaction has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared. And the status changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before committing and
+aborting respectively. FdwXact entry is removed with WAL logging after resolved.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status for those entries is FDWXACT_STATUS_PREPARED if they are recovered
+from WAL. Because we WAL logs only when preparing the foreign transaction we
+cannot know the exact fate of the foreign transaction from the recovery.
+
+The foreign transaction status transition is illustrated by the following
+graph describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+   (*1)         |      PREPARING      |           (*1)
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |     COMMITTING     |          |      ABORTING      |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                        END                         |
+ +----------------------------------------------------+
+
+(*1) Paths for recovered FdwXact entries
-- 
2.27.0

v34-0007-Introduce-foreign-transaction-launcher-and-resol.patchapplication/x-patch; name=v34-0007-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From 31916cfba2579e0b2315019944fbc2d2a81abdcf Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v34 07/11] Introduce foreign transaction launcher and resolver
 processes.

This commits introduces new background processes: foreign
transaction launcher and resolvers. With this change, users no longer
need to use pg_resolve_foreign_xact() to resolve foreign transaction
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK
TRANSACTION. These foreign transactions are resolved in background by
foreign transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/Makefile           |   2 +
 src/backend/access/transam/fdwxact.c          |  33 +-
 src/backend/access/transam/fdwxact_launcher.c | 558 ++++++++++++++++++
 src/backend/access/transam/fdwxact_resolver.c | 337 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   6 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  61 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1155 insertions(+), 12 deletions(-)
 create mode 100644 src/backend/access/transam/fdwxact_launcher.c
 create mode 100644 src/backend/access/transam/fdwxact_resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index b05a88549d..26a5ee589c 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -16,6 +16,8 @@ OBJS = \
 	clog.o \
 	commit_ts.o \
 	fdwxact.o \
+	fdwxact_launcher.o \
+	fdwxact_resolver.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 1e24bb1c12..def4988237 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -22,10 +22,10 @@
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having been
  * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
- * local transaction but not do anything for involved foreign transactions.  To resolve
- * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
- * function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * local transaction but not do anything for involved foreign transactions.  The preapred
+ * foreign transactions are resolved by a resolver process asynchronously.  Also, the
+ * user can use pg_resolve_foreign_xact() SQL function to resolve a foreign transaction
+ * manually.
  *
  * LOCKING
  *
@@ -76,7 +76,10 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -160,6 +163,7 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void AtProcExit_FdwXact(int code, Datum arg);
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
@@ -168,7 +172,6 @@ static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
 static FdwXact FdwXactInsertEntry(TransactionId xid,
 								  FdwXactParticipant *fdw_part);
-static void ResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 static void ResolveOneFdwXact(FdwXact fdwxact);
 static void FdwXactComputeRequiredXmin(void);
 static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
@@ -774,12 +777,13 @@ ForgetAllFdwXactParticipants(void)
 
 	/*
 	 * If we leave any FdwXact entries, update the oldest local transaction of
-	 * unresolved distributed transaction.
+	 * unresolved distributed transaction and notify the launcher.
 	 */
 	if (nremaining > 0)
 	{
 		elog(DEBUG1, "%u foreign transactions remaining", nremaining);
 		FdwXactComputeRequiredXmin();
+		FdwXactLaunchOrWakeupResolver();
 	}
 
 	list_free_deep(FdwXactParticipants);
@@ -787,7 +791,9 @@ ForgetAllFdwXactParticipants(void)
 }
 
 /*
- * Commit or rollback all foreign transactions.
+ * Close in-progress involved foreign transactions.  We don't perform the second
+ * phase of two-phase commit protocol here.  All prepared foreign transactions
+ * enter in-doubt state and a resolver process will process them.
  */
 void
 AtEOXact_FdwXact(bool is_commit)
@@ -891,7 +897,7 @@ AtPrepare_FdwXact(void)
  * The caller must hold the given foreign transactions in advance to prevent
  * concurrent update.
  */
-static void
+void
 ResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
 {
 	for (int i = 0; i < nfdwxacts; i++)
@@ -926,6 +932,17 @@ FdwXactExists(Oid dbid, Oid serverid, Oid userid)
 
 	return (idx >= 0);
 }
+bool
+FdwXactExistsXid(TransactionId xid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(InvalidOid, xid, InvalidOid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
 
 /*
  * Return the index of first found FdwXact entry that matched to given arguments.
diff --git a/src/backend/access/transam/fdwxact_launcher.c b/src/backend/access/transam/fdwxact_launcher.c
new file mode 100644
index 0000000000..79b5c21252
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_launcher.c
@@ -0,0 +1,558 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+static volatile sig_atomic_t got_SIGUSR2 = false;
+
+static void FdwXactLauncherOnExit(int code, Datum arg);
+static void FdwXactLaunchResolver(Oid dbid);
+static bool FdwXactRelaunchResolvers(void);
+
+/* Signal handler */
+static void FdwXactLaunchHandler(SIGNAL_ARGS);
+
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactRequestToLaunchResolver(void)
+{
+	if (FdwXactResolverCtl->launcher_pid != InvalidPid)
+		kill(FdwXactResolverCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactLauncherShmemInit */
+Size
+FdwXactLauncherShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactResolverCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactLauncherShmemInit(void)
+{
+	bool		found;
+
+	FdwXactResolverCtl = ShmemInitStruct("Foreign Transaction Launcher Data",
+										 FdwXactLauncherShmemSize(),
+										 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactResolverCtl, 0, FdwXactLauncherShmemSize());
+		SHMQueueInit(&(FdwXactResolverCtl->fdwxact_queue));
+		FdwXactResolverCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+FdwXactLauncherOnExit(int code, Datum arg)
+{
+	FdwXactResolverCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+FdwXactLaunchHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(FdwXactLauncherOnExit, (Datum) 0);
+
+	Assert(FdwXactResolverCtl->launcher_pid == InvalidPid);
+	FdwXactResolverCtl->launcher_pid = MyProcPid;
+	FdwXactResolverCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGUSR2, FdwXactLaunchHandler);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = FdwXactRelaunchResolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactRequestToLaunchResolver();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+FdwXactLaunchResolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactResolverCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+FdwXactRelaunchResolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+			hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			FdwXactLaunchResolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactResolverCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactResolverCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/transam/fdwxact_resolver.c b/src/backend/access/transam/fdwxact_resolver.c
new file mode 100644
index 0000000000..02230c82f2
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_resolver.c
@@ -0,0 +1,337 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactResolverCtlData *FdwXactResolverCtl;
+
+static void FdwXactResolverLoop(void);
+static long FdwXactResolverComputeSleepTime(TimestampTz now,
+											TimestampTz targetTime);
+static void FdwXactResolverCheckTimeout(TimestampTz now);
+
+static void FdwXactResolverOnExit(int code, Datum arg);
+static void FdwXactResolverDetach(void);
+static void FdwXactResolverAttach(int slot);
+static void HoldInDoubtFdwXacts(void);
+
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+FdwXactResolverDetach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+FdwXactResolverOnExit(int code, Datum arg)
+{
+	FdwXactResolverDetach();
+
+	/* Release the held foreign transaction entries */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+		fdwxact->locking_backend = InvalidBackendId;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+FdwXactResolverAttach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactResolverCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(FdwXactResolverOnExit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	FdwXactResolverAttach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactResolverLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactResolverLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		HoldInDoubtFdwXacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			ResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FdwXactResolverCheckTimeout(now);
+
+		sleep_time = FdwXactResolverComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactResolverCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	FdwXactResolverDetach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FdwXactResolverComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+HoldInDoubtFdwXacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->local_xid))
+		{
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 5c8a55358d..077eb0009f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2286,6 +2288,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2345,6 +2354,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExistsXid(xid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..2c7f55f8d9 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 701ccb3a03..dd3397f998 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3910,6 +3910,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7de27ee4e0..ed75bb8538 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -910,6 +911,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -975,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 6f14a950bf..5559080f5f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactLauncherShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactLauncherShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 4124321640..a297c746cd 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cb5a96117f..d44cdaba0c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3100,6 +3102,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9c78b2a90a..add8e598e8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -763,6 +763,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2481,6 +2485,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 68548b4633..58ac54b8c8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -735,6 +735,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 83a1db842e..7656ddad02 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -104,13 +104,19 @@ typedef struct FdwXactInfo
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void AtPrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void ResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern bool FdwXactExistsXid(TransactionId xid);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
 								Oid userid, void *content, int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..191823f53f
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactRequestToLaunchResolver(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactLauncherShmemSize(void);
+extern void FdwXactLauncherShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..e69c567967
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..42f17120b0
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactResolverCtlData struct for the whole database cluster */
+typedef struct FdwXactResolverCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactResolverCtlData;
+#define SizeOfFdwXactResolverCtlData \
+	(offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactResolverCtlData *FdwXactResolverCtl;
+extern FdwXactResolver *MyFdwXactResolver;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 04a8183db8..64f9dea5f3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6205,6 +6205,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b50daaeb79..47972e24fc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,6 +917,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index b9b5c1adda..94e593ac77 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v34-0006-Add-GetPrepareId-API.patchapplication/x-patch; name=v34-0006-Add-GetPrepareId-API.patchDownload
From 80d10a7f1ffb548883802e05c5d5795191071a93 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v34 06/11] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c | 54 +++++++++++++++++++++++-----
 src/include/foreign/fdwapi.h         |  4 +++
 2 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 507953ce14..1e24bb1c12 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -146,6 +146,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -350,6 +351,7 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	return fdw_part;
 }
@@ -417,9 +419,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * Return a null-terminated foreign transaction identifier.  If the given
+ * foreign server's FDW provides getPrepareId callback we return the identifier
+ * returned from it. Otherwise we generate an unique identifier with in the
+ * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is
  * less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
@@ -434,13 +437,48 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
-			 Abs(random()), xid, fdw_part->server->serverid,
-			 fdw_part->usermapping->userid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+				 Abs(random()), xid, fdw_part->server->serverid,
+				 fdw_part->usermapping->userid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index b0a63f1d8f..64bdbff7ce 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -179,9 +179,12 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+
 typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -266,6 +269,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v34-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/x-patch; name=v34-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 40a38f3aebf678b3da6ce791acae5fafd59d0944 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v34 02/11] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 464 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 234 insertions(+), 239 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index ee0b4acf0b..72ac74ca21 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
 #include "funcapi.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -90,8 +91,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -104,6 +104,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 static bool disconnect_cached_connections(Oid serverid);
 
 /*
@@ -119,53 +121,14 @@ static bool disconnect_cached_connections(Oid serverid);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
 	/* Set flag that we did GetConnection during the current transaction */
 	xact_got_connection = true;
 
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -197,7 +160,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -258,7 +221,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -267,6 +230,53 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -557,7 +567,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -569,6 +579,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user->serverid, user->userid);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -784,199 +797,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state or it is marked as
-		 * invalid, then discard it to recover. Next GetConnection will open a
-		 * new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state ||
-			entry->invalidated)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1599,3 +1419,171 @@ disconnect_cached_connections(Oid serverid)
 
 	return result;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	do_sql_command(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error  = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+   xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 07e06e5bf7..d257eec3f8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9012,7 +9012,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2ce42ce3f1..66a47f9f31 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -583,6 +583,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d9fd..c44d37f280 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
+extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v34-0003-Recreate-RemoveForeignServerById.patchapplication/x-patch; name=v34-0003-Recreate-RemoveForeignServerById.patchDownload
From 2f028809a370b832832aea66384a984bd9999bf9 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 12 Jun 2020 11:49:02 +0900
Subject: [PATCH v34 03/11] Recreate RemoveForeignServerById()

This commit recreates RemoveForeignServerById that was removed by
b1d32d3e3. This is necessary for follow up commit that checks if the
foreign server has prepared transaction or not when removing.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/catalog/dependency.c   |  5 ++++-
 src/backend/commands/foreigncmds.c | 22 ++++++++++++++++++++++
 src/include/commands/defrem.h      |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 2140151a6a..7c9899f14d 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1549,6 +1549,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_FOREIGN_SERVER:
+			RemoveForeignServerById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1563,7 +1567,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSDICT:
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
-		case OCLASS_FOREIGN_SERVER:
 		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index eb7103fd3b..ec024fa106 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1060,6 +1060,28 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server by OID
+ */
+void
+RemoveForeignServerById(Oid srvId)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(ForeignServerRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(srvId));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 1a79540c94..07a3f76bb4 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -125,6 +125,7 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern void RemoveForeignServerById(Oid srvId);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
-- 
2.27.0

v34-0004-Add-PrepareForeignTransaction-API.patchapplication/x-patch; name=v34-0004-Add-PrepareForeignTransaction-API.patchDownload
From cc5a535a7cf97e2696985a9e41c2fcb69679139f Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sun, 20 Sep 2020 16:49:20 +0900
Subject: [PATCH v34 04/11] Add PrepareForeignTransaction API.

This commits add a new FDW API, PrepareForeignTransaction. Using this
API, the transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit also adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   58 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/fdwxact.c          | 1763 ++++++++++++++++-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    1 +
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   22 +
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   39 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +
 src/include/access/fdwxact_xlog.h             |   54 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    1 +
 src/test/regress/expected/rules.out           |    7 +
 34 files changed, 2151 insertions(+), 30 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d257eec3f8..73a0868347 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9012,7 +9012,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..3c0d999b81
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_insert->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid);
+		appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid);
+		appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid);
+		appendStringInfo(buf, " user: %u,", fdwxact_remove->userid);
+		appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid);
+		appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..e4ae79e599 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 03f351924b..507953ce14 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -9,8 +9,59 @@
  * An FDW that implements both commit and rollback APIs can request to register
  * the foreign transaction by FdwXactRegisterXact() to participate it to a
  * group of distributed tranasction.  The registered foreign transactions are
- * identified by OIDs of server and user.  On commit and rollback, the global
- * transaction manager calls corresponding FDW API to end the tranasctions.
+ * identified by OIDs of server and user.  On commit, rollback and prepare, the
+ * global transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having been
+ * modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback only the
+ * local transaction but not do anything for involved foreign transactions.  To resolve
+ * these foreign transactions the user needs to use pg_resolve_foreign_xact() SQL
+ * function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is updated. To avoid holding the lock during transaction processing
+ * which may take an unpredictable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * All FdwXact fields except for status are protected by FdwXactLock.  The
+ *   status is protected by its mutex.
+ * * A process who is going to work on the foreign transaction needs to set
+ *	 locking_backend of the FdwXact entry, which prevents the entry from being
+ *	 updated and removed by concurrent processes.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
  *
  * Portions Copyright (c) 2021, PostgreSQL Global Development Group
  *
@@ -20,15 +71,53 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/twophase.h"
+#include "access/xact.h"
 #include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes database oid,
+ * xid, foreign server oid and user oid separated by '_'.
+ *
+ * Since FdwXact stat file is created per foreign transaction in a
+ * distributed transaction and the xid of unresolved distributed
+ * transaction never reused, the name is fairly enough to ensure
+ * uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8)
+#define FdwXactFilePath(path, dbid, xid, serverid, userid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \
+			 dbid, xid, serverid, userid)
 
 /*
  * Structure to bundle the foreign transaction participant.	 This struct
@@ -40,13 +129,23 @@
  */
 typedef struct FdwXactParticipant
 {
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Foreign server and user mapping info, passed to callback routines */
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/* Transaction identifier used for PREPARE */
+	char	   *fdwxact_id;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
@@ -55,11 +154,103 @@ typedef struct FdwXactParticipant
  */
 static List *FdwXactParticipants = NIL;
 
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool commit);
+static FdwXact FdwXactInsertEntry(TransactionId xid,
+								  FdwXactParticipant *fdw_part);
+static void ResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
+static void ResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+							  Oid userid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+								  Oid userid, XLogRecPtr insert_start_lsn,
+								  bool fromdisk);
+static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  bool giveWarning);
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+							  Oid umid, char *fdwxact_id);
+static void remove_fdwxact(FdwXact fdwxact);
 static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
 													  FdwRoutine *routine);
+static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part,
+									TransactionId xid);
+static int	get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given arguments
@@ -85,6 +276,13 @@ FdwXactRegisterXact(Oid serverid, Oid userid)
 		}
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	routine = GetFdwRoutineByServerId(serverid);
 
 	/*
@@ -132,7 +330,7 @@ FdwXactUnregisterXact(Oid serverid, Oid userid)
 	}
 }
 
-/* Return palloc'd FdwXactParticipant variable */
+/* Return a palloc'd FdwXactParticipant variable */
 static FdwXactParticipant *
 create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 {
@@ -145,14 +343,336 @@ create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
 
 	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->server = foreign_server;
 	fdw_part->usermapping = user_mapping;
+	fdw_part->fdwxact_id = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	return fdw_part;
 }
 
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	ListCell   *lc;
+
+	Assert(FdwXactParticipants != NIL);
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXactInfo finfo;
+		FdwXact		fdwxact;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid);
+		Assert(fdw_part->fdwxact_id);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertEntry(xid, fdw_part);
+
+		/*
+		 * Prepare the foreign transaction.
+		 *
+		 * Between FdwXactInsertEntry call till this backend hears
+		 * acknowledge from foreign server, the backend may abort the local
+		 * transaction (say, because of a signal).
+		 */
+		finfo.server = fdw_part->server;
+		finfo.usermapping = fdw_part->usermapping;
+		finfo.fdwxact_id = fdw_part->fdwxact_id;
+		fdw_part->prepare_foreign_xact_fn(&finfo);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<serverid>_<userid> whose length is
+ * less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d",
+			 Abs(random()), xid, fdw_part->server->serverid,
+			 fdw_part->usermapping->userid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertEntry(TransactionId xid, FdwXactParticipant *fdw_part)
+{
+	FdwXact		fdwxact;
+	FdwXactOnDiskData *fdwxact_file_data;
+	MemoryContext old_context;
+	int			data_len;
+
+	old_context = MemoryContextSwitchTo(TopTransactionContext);
+
+	/*
+	 * Enter the foreign transaction in the shared memory structure.
+	 */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid,
+							 fdw_part->usermapping->userid,
+							 fdw_part->usermapping->umid, fdw_part->fdwxact_id);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+	MemoryContextSwitchTo(old_context);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, fdwxact_id);
+	data_len = data_len + strlen(fdw_part->fdwxact_id) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	fdwxact_file_data->dbid = MyDatabaseId;
+	fdwxact_file_data->local_xid = xid;
+	fdwxact_file_data->serverid = fdw_part->server->serverid;
+	fdwxact_file_data->userid = fdw_part->usermapping->userid;
+	fdwxact_file_data->umid = fdw_part->usermapping->umid;
+	memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id,
+		   strlen(fdw_part->fdwxact_id) + 1);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+			   Oid umid, char *fdwxact_id)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->dbid == dbid &&
+			fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid &&
+			fdwxact->userid == userid)
+			ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
+							errdetail("Duplicate entry with transaction id %u, serverid %u, userid %u exists.",
+									  xid, serverid, userid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->local_xid = xid;
+	fdwxact->dbid = dbid;
+	fdwxact->serverid = serverid;
+	fdwxact->userid = userid;
+	fdwxact->umid = umid;
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+	memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1);
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	/* We did not find the given entry in the array */
+	if (i >= FdwXactCtl->num_fdwxacts)
+		ereport(ERROR,
+				(errmsg("could not remove a foreign transaction entry"),
+				 errdetail("Failed to find entry for xid %u, foreign server %u, and user %u.",
+						   fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+	elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d",
+		 fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid,
+		 fdwxact->userid);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.serverid = fdwxact->serverid;
+		record.dbid = fdwxact->dbid;
+		record.xid = fdwxact->local_xid;
+		record.userid = fdwxact->userid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
 /*
  * The routine for committing or rolling back the given transaction participant.
  */
@@ -184,14 +704,46 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
 }
 
 /*
- * Clear the FdwXactParticipants list.
+ * Unlock foreign transaction participants and clear the FdwXactParticipants
+ * list.  If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction so that local transaction id of such unresolved foreign transaction
+ * is not truncated.
  */
 static void
 ForgetAllFdwXactParticipants(void)
 {
+	ListCell   *cell;
+	int			nremaining = 0;
+
 	if (FdwXactParticipants == NIL)
 		return;
 
+	foreach(cell, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		/* Nothing to do if didn't register FdwXact entry yet */
+		if (!fdwxact)
+			continue;
+
+		/* Unlock the foreign transaction entry */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+		nremaining++;
+	}
+
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nremaining > 0)
+	{
+		elog(DEBUG1, "%u foreign transactions remaining", nremaining);
+		FdwXactComputeRequiredXmin();
+	}
+
 	list_free_deep(FdwXactParticipants);
 	FdwXactParticipants = NIL;
 }
@@ -214,24 +766,1209 @@ AtEOXact_FdwXact(bool is_commit)
 	foreach(lc, FdwXactParticipants)
 	{
 		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (!fdwxact)
+		{
+			/* Commit or rollback the foreign transaction in one-phase */
+			Assert(ServerSupportTransactionCallback(fdw_part));
+			FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			continue;
+		}
+
+		/*
+		 * This foreign transaction might have been prepared.  In commit case,
+		 * we don't need to do anything for this participant because all foreign
+		 * transactions should have already been prepared and therefore the
+		 * transaction already closed. These will be resolved manually.  On the
+		 * other hand in abort case, we need to close the transaction if
+		 * preparing might be in-progress, since an error might have occurred
+		 * on preparing a foreign transaction.
+		 */
+		if (!is_commit)
+		{
+			int					   status;
 
-		Assert(ServerSupportTransactionCallback(fdw_part));
-		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, false);
+		}
 	}
 
 	ForgetAllFdwXactParticipants();
 }
 
 /*
- * This function is called at PREPARE TRANSACTION.  Since we don't support
- * preparing foreign transactions yet, raise an error if the local transaction
- * has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * Note that it's possible that the transaction aborts after we prepared some
+ * of participants. In this case we change to rollback and rollback all foreign
+ * transactions.
  */
 void
 AtPrepare_FdwXact(void)
 {
-	if (FdwXactParticipants != NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+	ListCell   *lc;
+	TransactionId xid;
+
+	if (FdwXactParticipants == NIL)
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * We keep FdwXactParticipants until the transaction end so that we change
+	 * the involved foreign transactions to ABORTING in case of failure.
+	 */
+}
+
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+static void
+ResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+
+		ResolveOneFdwXact(fdwxact);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+							  fdwxact->userid, true);
+		remove_fdwxact(fdwxact);
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(Oid dbid, Oid serverid, Oid userid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact(dbid, InvalidTransactionId, serverid, userid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* dbid */
+		if (OidIsValid(dbid) && fdwxact->dbid != dbid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid)
+			continue;
+
+		/* serverid */
+		if (OidIsValid(serverid) && serverid != fdwxact->serverid)
+			continue;
+
+		/* userid */
+		if (OidIsValid(userid) && fdwxact->userid != userid)
+			continue;
+
+		/* This entry matches the condition */
+		return i;
+	}
+
+	return -1;
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->local_xid));
+
+		/*
+		 * We can exclude entries that are marked as either committing or
+		 * aborting and its state file is on disk since such entries
+		 * no longer need to lookup its transaction status from the commit
+		 * log.
+		 */
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->local_xid, agg_xmin) ||
+			(fdwxact->ondisk &&
+			 (fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+			  fdwxact->status == FDWXACT_STATUS_ABORTING)))
+			agg_xmin = fdwxact->local_xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	elog(ERROR,
+		 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+ResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactInfo finfo;
+	ForeignServer *server;
+	ForeignDataWrapper *fdw;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+	Assert(fdwxact->status == FDWXACT_STATUS_PREPARED ||
+		   fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+		   fdwxact->status == FDWXACT_STATUS_ABORTING);
+
+	/* Set whether we do commit or abort if not set yet */
+	if (fdwxact->status == FDWXACT_STATUS_PREPARED)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->local_xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	server = GetForeignServer(fdwxact->serverid);
+	fdw = GetForeignDataWrapper(server->fdwid);
+	routine = GetFdwRoutine(fdw->fdwhandler);
+
+	/* Prepare the foreign transaction information to pass to API */
+	finfo.server = server;
+	finfo.usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid);
+	finfo.fdwxact_id = fdwxact->fdwxact_id;
+	finfo.flags = 0;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction for server %u user %u",
+			 fdwxact->serverid, fdwxact->userid);
+	}
+}
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record),
+					   record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->dbid, record->xid, record->serverid,
+						  record->userid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid,
+							 fdwxact_data->serverid, fdwxact_data->userid,
+							 fdwxact_data->umid, fdwxact_data->fdwxact_id);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->local_xid,
+		 fdwxact_data->serverid, fdwxact_data->userid,
+		 fdwxact_data->fdwxact_id);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid,
+				  Oid userid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->dbid == dbid && fdwxact->local_xid == xid &&
+			fdwxact->serverid == serverid && fdwxact->userid == userid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+						  fdwxact->serverid, fdwxact->userid,
+						  givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s",
+		 fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid,
+		 fdwxact->userid, fdwxact->fdwxact_id);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts == 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+								fdwxact->serverid, fdwxact->userid,
+								buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+					Oid userid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid,
+					 Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			RemoveFdwXactFile(dbid, xid, serverid, userid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u",
+							xid, serverid, userid)));
+			FdwXactRedoRemove(dbid, xid, serverid, userid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(dbid, xid, serverid, userid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->dbid != dbid ||
+		fdwxact_file_data->serverid != serverid ||
+		fdwxact_file_data->userid != userid ||
+		fdwxact_file_data->local_xid != xid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->local_xid, result))
+			result = fdwxact->local_xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId local_xid;
+			Oid			dbid;
+			Oid			serverid;
+			Oid			userid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x_%08x_%08x",
+				   &dbid, &local_xid, &serverid, &userid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid,
+									   InvalidXLogRecPtr, true);
+
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid,
+				  bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, dbid, xid, serverid, userid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid,
+								   fdwxact->serverid, fdwxact->userid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %u for server %u and user %u from shared memory",
+						fdwxact->local_xid, fdwxact->serverid, fdwxact->userid)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->local_xid);
+		values[1] = ObjectIdGetDatum(fdwxact->serverid);
+		values[2] = ObjectIdGetDatum(fdwxact->userid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->fdwxact_id);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->fdwxact_id)));
+	}
+
+	if (TwoPhaseExists(fdwxact->local_xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->fdwxact_id),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		ResolveFdwXacts(&idx, 1);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactCtl->fdwxacts[idx]->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			serverid = PG_GETARG_OID(1);
+	Oid			userid = PG_GETARG_OID(2);
+	Oid			myuserid;
+	FdwXact		fdwxact;
+	int			idx;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to remove foreign transactions"))));
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact(MyDatabaseId, xid, serverid, userid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction on server %u",
+						serverid)));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->userid && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("permission denied to remove prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction"))));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being held by someone */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction id %u, server %u, and user %u is busy",
+						xid, serverid, userid)));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid,
+							  fdwxact->serverid, fdwxact->userid,
+							  true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index fc18b77832..5c8a55358d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 497abcb491..3db7fa94c1 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2568,6 +2568,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cc007b8963..90e612ab74 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4626,6 +4627,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6360,6 +6362,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6907,14 +6912,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7116,7 +7122,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7628,11 +7637,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7960,8 +7971,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9315,6 +9330,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9853,6 +9869,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9872,6 +9889,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9890,6 +9908,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -10097,6 +10116,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10300,6 +10320,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..588d229fd2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index ec024fa106..492627caa1 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1076,6 +1077,18 @@ RemoveForeignServerById(Oid srvId)
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for foreign server %u", srvId);
 
+	/*
+	 * We cannot drop the foreign server if there is a foreign prepared
+	 * transaction with this foreign server,
+	 */
+	if (FdwXactExists(MyDatabaseId, srvId, InvalidOid))
+	{
+		Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp);
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transactions on it",
+						NameStr(srvForm->srvname))));
+	}
+
 	CatalogTupleDelete(rel, &tp->t_self);
 
 	ReleaseSysCache(tp);
@@ -1396,6 +1409,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(MyDatabaseId, srv->serverid,	useId))
+		ereport(WARNING,
+				(errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"",
+						srv->servername, MappingUserName(useId))));
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index f8eb4fa215..6ce76b2aec 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..701ccb3a03 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4238,6 +4238,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d897f2c5fc 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -178,6 +178,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..6f14a950bf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index cf12eda504..ceb51d43a4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1709,6 +1715,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1836,6 +1843,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1893,6 +1906,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3804,6 +3820,21 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..4124321640 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..9c78b2a90a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2470,6 +2471,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8930a94fff..68548b4633 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,6 +127,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index e242a4a5b5..2e60ae3f6d 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -205,6 +205,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..dd70a0f8a2 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 05b36ebf2b..83a1db842e 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -10,24 +10,112 @@
 #ifndef FDWXACT_H
 #define FDWXACT_H
 
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	TransactionId local_xid;	/* XID of local transaction */
+
+	/* Information relevant with foreign transaction */
+	Oid			dbid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			umid;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+} FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactInfo
 {
 	/* Foreign transaction information */
+	char		   *fdwxact_id;
 	ForeignServer *server;
 	UserMapping *usermapping;
 
 	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
 } FdwXactInfo;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtEOXact_FdwXact(bool is_commit);
 extern void AtPrepare_FdwXact(void);
+extern bool FdwXactExists(Oid dbid, Oid serverid, Oid userid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid,
+								Oid userid, void *content, int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..40903f5f2c
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId local_xid;
+	Oid			dbid;			/* database oid where to find foreign server
+								 * and user mapping */
+	Oid			serverid;		/* foreign server where transaction takes
+								 * place */
+	Oid			userid;			/* user who initiated the foreign transaction */
+	Oid			umid;
+	char		fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid			serverid;
+	Oid			userid;
+	Oid			dbid;
+	bool		force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index f582cf535f..5ab1f57212 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 91786da784..3d35f89ae0 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..0823baf1a1 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..5673ec7299 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..04a8183db8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6068,6 +6068,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,serverid,userid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid oid',
+  proargnames => '{xid,serverid,userid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 79f62ac354..b0a63f1d8f 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -179,6 +179,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
 
@@ -264,6 +265,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..b50daaeb79 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1044,6 +1044,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index b01fa52139..300a4cf5b6 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -93,5 +93,6 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6173473de9..822d3e09ae 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.serverid,
+    f.userid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, serverid, userid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v34-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/x-patch; name=v34-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From 9353ea0a43627d2e9ec62d40e566c6520cf26a89 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v34 01/11] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/Makefile  |   1 +
 src/backend/access/transam/fdwxact.c | 237 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |  14 ++
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  33 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 6 files changed, 301 insertions(+)
 create mode 100644 src/backend/access/transam/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 595e02de72..b05a88549d 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	clog.o \
 	commit_ts.o \
+	fdwxact.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
new file mode 100644
index 0000000000..03f351924b
--- /dev/null
+++ b/src/backend/access/transam/fdwxact.c
@@ -0,0 +1,237 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the tranasctions.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xlog.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction where we cannot look at
+ * syscaches. Therefore, this is allocated in the TopTransactionContext.
+ *
+ * Participants are identified by the pair of server OID and user OID,
+ * rather than user mapping OID. See README.fdwxact for the discussion.
+ */
+typedef struct FdwXactParticipant
+{
+	/* Foreign server and user mapping info, passed to callback routines */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * List of foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static List *FdwXactParticipants = NIL;
+
+static void ForgetAllFdwXactParticipants(void);
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool commit);
+static FdwXactParticipant *create_fdwxact_participant(Oid serverid, Oid userid,
+													  FdwRoutine *routine);
+
+/*
+ * Register the given foreign transaction identified by the given arguments
+ * as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(Oid serverid, Oid userid)
+{
+	FdwXactParticipant *fdw_part;
+	MemoryContext old_ctx;
+	FdwRoutine *routine;
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Already registered */
+			return;
+		}
+	}
+
+	routine = GetFdwRoutineByServerId(serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	/*
+	 * Participant's information is also used at the end of a transaction,
+	 * where system cache are not available. Save it in TopTransactionContext
+	 * so that these can live until the end of transaction.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdw_part = create_fdwxact_participant(serverid, userid, routine);
+
+	/* Add to the participants list */
+	FdwXactParticipants = lappend(FdwXactParticipants, fdw_part);
+
+	/* Revert back the context */
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the given foreign server from FdwXactParticipants */
+void
+FdwXactUnregisterXact(Oid serverid, Oid userid)
+{
+	ListCell   *lc;
+
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)
+		{
+			/* Remove the entry */
+			FdwXactParticipants =
+				foreach_delete_current(FdwXactParticipants, lc);
+			break;
+		}
+	}
+}
+
+/* Return palloc'd FdwXactParticipant variable */
+static FdwXactParticipant *
+create_fdwxact_participant(Oid serverid, Oid userid, FdwRoutine *routine)
+{
+	FdwXactParticipant *fdw_part;
+	ForeignServer *foreign_server;
+	UserMapping *user_mapping;
+
+	foreign_server = GetForeignServer(serverid);
+	user_mapping = GetUserMapping(userid, serverid);
+
+	fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant));
+
+	fdw_part->server = foreign_server;
+	fdw_part->usermapping = user_mapping;
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	return fdw_part;
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool commit)
+{
+	FdwXactInfo finfo;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	finfo.server = fdw_part->server;
+	finfo.usermapping = fdw_part->usermapping;
+	finfo.flags = FDWXACT_FLAG_ONEPHASE;
+
+	if (commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully committed the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for server %u user %u",
+			 fdw_part->usermapping->serverid,
+			 fdw_part->usermapping->userid);
+	}
+}
+
+/*
+ * Clear the FdwXactParticipants list.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	if (FdwXactParticipants == NIL)
+		return;
+
+	list_free_deep(FdwXactParticipants);
+	FdwXactParticipants = NIL;
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit)
+{
+	ListCell   *lc;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (FdwXactParticipants == NIL)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit or rollback foreign transactions in the participant list */
+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		Assert(ServerSupportTransactionCallback(fdw_part));
+		FdwXactParticipantEndTransaction(fdw_part, is_commit);
+	}
+
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * This function is called at PREPARE TRANSACTION.  Since we don't support
+ * preparing foreign transactions yet, raise an error if the local transaction
+ * has any foreign transaction.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..497abcb491 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2125,6 +2126,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
+	/* Commit foreign transactions if any */
+	AtEOXact_FdwXact(true);
+
 	/* If we might have parallel workers, clean them up now. */
 	if (IsInParallelMode())
 		AtEOXact_Parallel(true);
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Process foreign trasactions */
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2737,6 +2744,13 @@ AbortTransaction(void)
 
 	TRACE_POSTGRESQL_TRANSACTION_ABORT(MyProc->lxid);
 
+	/*
+	 * Abort foreign transactions if any.  This needs to be done before marking
+	 * this transaction as not running since FDW's transaction callbacks might
+	 * assume this transaction is still in progress.
+	 */
+	AtEOXact_FdwXact(false);
+
 	/*
 	 * Let others know about no transaction in progress by me. Note that this
 	 * must be done _before_ releasing locks we hold and _after_
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 5564dc3a1e..f8eb4fa215 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support both or nothing */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..05b36ebf2b
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,33 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactInfo
+{
+	/* Foreign transaction information */
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	int			flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactInfo;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit);
+extern void AtPrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78da45..79f62ac354 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -178,6 +179,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
+typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -256,6 +260,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -269,4 +277,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(Oid serverid, Oid userid);
+extern void FdwXactUnregisterXact(Oid serverid, Oid userid);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#217Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhihong Yu (#212)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sat, Jan 16, 2021 at 1:39 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,

Thank you for reviewing the patch!

For v32-0004-Add-PrepareForeignTransaction-API.patch :

+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update.     To avoid holding the lock during transaction processing
+ * which may take an unpredicatable time the in-memory data of foreign

entry is update -> entry is updated

unpredictable -> unpredictable

Fixed.
¨

+ int nlefts = 0;

nlefts -> nremaining

+ elog(DEBUG1, "left %u foreign transactions", nlefts);

The message can be phrased as "%u foreign transactions remaining"

Fixed.

+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)

Fdw and Xact are repeated. Seems one should suffice. How about naming the method FdwXactResolveTransactions() ?
Similar comment for FdwXactResolveOneFdwXact(FdwXact fdwxact)

Agreed. I changed to ResolveFdwXacts() and ResolveOneFdwXact()
respectively to avoid a long function name.

For get_fdwxact():

+       /* This entry matches the condition */
+       found = true;
+       break;

Instead of breaking and returning, you can return within the loop directly.

Fixed.

Those changes are incorporated into the latest version patches[1]/messages/by-id/CAD21AoBYyA5O+FPN4Cs9YWiKjq319BvF5fYmKNsFTZfwTcWjQw@mail.gmail.com I
submitted today.

Regards,

[1]: /messages/by-id/CAD21AoBYyA5O+FPN4Cs9YWiKjq319BvF5fYmKNsFTZfwTcWjQw@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#218Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#216)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/01/27 14:08, Masahiko Sawada wrote:

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

You fixed some issues. But maybe you forgot to attach the latest patches?

Yes, I've attached the updated patches.

Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change.

+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.

I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions can have the same combination of OIDs of server and user. For example, the following two SELECT queries start the different foreign transactions but their user OID is the same. OID of user mapping should be used instead of OID of user?

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t(i int);
CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't');
BEGIN;
SELECT * FROM ft;
DROP USER MAPPING FOR postgres SERVER loopback ;
SELECT * FROM ft;
COMMIT;

+	/* Commit foreign transactions if any */
+	AtEOXact_FdwXact(true);

Don't we need to pass XACT_EVENT_PARALLEL_PRE_COMMIT or XACT_EVENT_PRE_COMMIT flag? Probably we don't need to do this if postgres_fdw is only user of this new API. But if we make this new API generic one, such flags seem necessary so that some foreign data wrappers might have different behaviors for those flags.

Because of the same reason as above, AtEOXact_FdwXact() should also be called after CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT : XACT_EVENT_COMMIT)?

+	/*
+	 * Abort foreign transactions if any.  This needs to be done before marking
+	 * this transaction as not running since FDW's transaction callbacks might
+	 * assume this transaction is still in progress.
+	 */
+	AtEOXact_FdwXact(false);

Same as above.

+/*
+ * This function is called at PREPARE TRANSACTION.  Since we don't support
+ * preparing foreign transactions yet, raise an error if the local transaction
+ * has any foreign transaction.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	if (FdwXactParticipants != NIL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}

This means that some foreign data wrappers suppporting the prepare transaction (though I'm not sure if such wappers actually exist or not) cannot use the new API? If we want to allow those wrappers to use new API, AtPrepare_FdwXact() should call the prepare callback and each wrapper should emit an error within the callback if necessary.

+	foreach(lc, FdwXactParticipants)
+	{
+		FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+		if (fdw_part->server->serverid == serverid &&
+			fdw_part->usermapping->userid == userid)

Isn't this ineffecient when starting lots of foreign transactions because we need to scan all the entries in the list every time?

+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);

Currently ConnectionHash is created under TopMemoryContext. With the patch, since GetConnectionCacheEntry() can be called in other places, ConnectionHash may be created under the memory context other than TopMemoryContext? If so, that's safe?

-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state ||
-			entry->invalidated)
...
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state)

Why did you get rid of the condition "entry->invalidated"?

I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature. If there are such changes, IMO we can apply them in advance, and which would make the patches simpler.

Thank you for reviewing the patches!

+       if (PQresultStatus(res) != PGRES_COMMAND_OK)
+               ereport(ERROR, (errmsg("could not commit transaction on server %s",
+                                                          frstate->server->servername)));

You changed the code this way because you want to include the server name in the error message? I agree that it's helpful to report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error() for the returned PGresult, the reported error message contains less information. If this understanding is right, I don't think that this change is an improvement.

Right. It's better to use do_sql_command() instead.

Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that it also reports the server name? If we do that, the server name is reported not only when COMMIT fails but also when other commands fail.

Of course, if this change is not essential, we can skip doing this in the first version.

Yes, I think it's not essential for now. We can improve it later if we want.

- /*
- * Regardless of the event type, we can now mark ourselves as out of the
- * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
- * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
- */
- xact_got_connection = false;

With this change, xact_got_connection seems to never be set to false. Doesn't this break pgfdw_subxact_callback() using xact_got_connection?

I think xact_got_connection is set to false in
pgfdw_cleanup_after_transaction() that is called at the end of each
foreign transaction (i.g., in postgresCommitForeignTransaction() and
postgresRollbackForeignTransaction()).

But as you're concerned below, it's reset for each foreign transaction
end rather than the parent's transaction end.

+       /* Also reset cursor numbering for next transaction */
+       cursor_number = 0;

Originally this variable is reset to 0 once per transaction end. But with the patch, it's reset to 0 every time when a foreign transaction ends at each connection. This change would be harmless fortunately in practice, but seems not right theoretically.

This makes me wonder if new FDW API is not good at handling the case where some operations need to be performed once per transaction end.

I think that the problem comes from the fact that FDW needs to use
both SubXactCallback and new FDW API.

If we want to perform some operations at the end of the top
transaction per FDW, not per foreign transaction, we will either still
need to use XactCallback or need to rethink the FDW API design. But
given that we call commit and rollback FDW API for only foreign
servers that actually started a transaction, I’m not sure if there are
such operations in practice. IIUC there is not at least from the
normal (not-sub) transaction termination perspective.

One feature in my mind that may not match with this new API is to perform transaction commits on multiple servers in parallel. That's something like the following. As far as I can recall, another proposed version of 2pc on postgres_fdw patch included that feature. If we want to implement this to increase the performance of transaction commit in the future, I'm afraid that new API will prevent that.

foreach(foreign transactions)
send commit command

foreach(foreign transactions)
wait for reply of commit

On second thought, new per-transaction commit/rollback callback is essential when users or the resolver process want to resolve the specifed foreign transaction, but not essential when backends commit/rollback foreign transactions. That is, even if we add per-transaction new API for users and resolver process, backends can still use CallXactCallbacks() when they commit/rollback foreign transactions. Is this understanding right?

IIUC xact_got_transaction is used to skip iterating over all cached
connections to find open remote (sub) transactions. This is not
necessary anymore at least from the normal transaction termination
perspective. So maybe we can improve it so that it tracks whether any
of the cached connections opened a subtransaction. That is, we set it
true when we created a savepoint on any connections and set it false
at the end of pgfdw_subxact_callback() if we see that xact_depth of
all cached entry is less than or equal to 1 after iterating over all
entries.

OK.

Regarding cursor_number, it essentially needs to be unique at least
within a transaction so we can manage it per transaction or per
connection. But the current postgres_fdw rather ensure uniqueness
across all connections. So it seems to me that this can be fixed by
making individual connection have cursor_number and resetting it in
pgfdw_cleanup_after_transaction(). I think this can be in a separate
patch.

Maybe, so let's work on this later, at least after we confirm that
this change is really necessary.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#219Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#218)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/01/27 14:08, Masahiko Sawada wrote:

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

You fixed some issues. But maybe you forgot to attach the latest patches?

Yes, I've attached the updated patches.

Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change.

+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.

I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions can have the same combination of OIDs of server and user. For example, the following two SELECT queries start the different foreign transactions but their user OID is the same. OID of user mapping should be used instead of OID of user?

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t(i int);
CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't');
BEGIN;
SELECT * FROM ft;
DROP USER MAPPING FOR postgres SERVER loopback ;
SELECT * FROM ft;
COMMIT;

Good catch. I've considered using user mapping OID or a pair of user
mapping OID and server OID as a key of foreign transactions but I
think it also has a problem if an FDW caches the connection by pair of
server OID and user OID whereas the core identifies them by user
mapping OID. For instance, mysql_fdw manages connections by pair of
server OID and user OID.

For example, let's consider the following execution:

BEGIN;
SET ROLE user_A;
INSERT INTO ft1 VALUES (1);
SET ROLE user_B;
INSERT INTO ft1 VALUES (1);
COMMIT;

Suppose that an FDW identifies the connections by {server OID, user
OID} and the core GTM identifies the transactions by user mapping OID,
and user_A and user_B use the public user mapping to connect server_X.
In the FDW, there are two connections identified by {user_A, sever_X}
and {user_B, server_X} respectively, and therefore opens two
transactions on each connection, while GTM has only one FdwXact entry
because the two connections refer to the same user mapping OID. As a
result, at the end of the transaction, GTM ends only one foreign
transaction, leaving another one.

Using user mapping OID seems natural to me but I'm concerned that
changing role in the middle of transaction is likely to happen than
dropping the public user mapping but not sure. We would need to find
more better way.

+       /* Commit foreign transactions if any */
+       AtEOXact_FdwXact(true);

Don't we need to pass XACT_EVENT_PARALLEL_PRE_COMMIT or XACT_EVENT_PRE_COMMIT flag? Probably we don't need to do this if postgres_fdw is only user of this new API. But if we make this new API generic one, such flags seem necessary so that some foreign data wrappers might have different behaviors for those flags.

Because of the same reason as above, AtEOXact_FdwXact() should also be called after CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT : XACT_EVENT_COMMIT)?

Agreed.

In AtEOXact_FdwXact() we call either CommitForeignTransaction() or
RollbackForeignTransaction() with FDWXACT_FLAG_ONEPHASE flag for each
foreign transaction. So for example in commit case, we will call new
FDW APIs in the following order:

1. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT
flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction.
2. Commit locally.
3. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT
flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction.

In the future when we have a new FDW API to prepare foreign
transaction, the sequence will be:

1. Call PrepareForeignTransaction() for each foreign transaction.
2. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT
flag for each foreign transaction.
3. Commit locally.
4. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT
flag for each foreign transaction.

So we expect FDW that wants to support 2PC not to commit foreign
transaction if CommitForeignTransaction() is called with
XACT_EVENT_PARALLEL_PRE_COMMIT flag and no FDWXACT_FLAG_ONEPHASE flag.

+       /*
+        * Abort foreign transactions if any.  This needs to be done before marking
+        * this transaction as not running since FDW's transaction callbacks might
+        * assume this transaction is still in progress.
+        */
+       AtEOXact_FdwXact(false);

Same as above.

+/*
+ * This function is called at PREPARE TRANSACTION.  Since we don't support
+ * preparing foreign transactions yet, raise an error if the local transaction
+ * has any foreign transaction.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+       if (FdwXactParticipants != NIL)
+               ereport(ERROR,
+                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}

This means that some foreign data wrappers suppporting the prepare transaction (though I'm not sure if such wappers actually exist or not) cannot use the new API? If we want to allow those wrappers to use new API, AtPrepare_FdwXact() should call the prepare callback and each wrapper should emit an error within the callback if necessary.

I think if we support the prepare callback and allow FDWs to prepare
foreign transactions, we have to call CommitForeignTransaction() on
COMMIT PREPARED for foreign transactions that are associated with the
local prepared transaction. But how can we know which foreign
transactions are? Even a client who didn’t do PREPARE TRANSACTION
could do COMMIT PREPARED. We would need to store the information of
which foreign transactions are associated with the local transaction
somewhere. The 0004 patch introduces WAL logging along with prepare
API and we store that information to a WAL record. I think it’s better
at this time to disallow PREPARE TRANSACTION when at least one foreign
transaction is registered via FDW API.

+       foreach(lc, FdwXactParticipants)
+       {
+               FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
+
+               if (fdw_part->server->serverid == serverid &&
+                       fdw_part->usermapping->userid == userid)

Isn't this ineffecient when starting lots of foreign transactions because we need to scan all the entries in the list every time?

Agreed. I'll change it to a hash map.

+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+       bool            found;
+       ConnCacheEntry *entry;
+       ConnCacheKey key;
+
+       /* First time through, initialize connection cache hashtable */
+       if (ConnectionHash == NULL)
+       {
+               HASHCTL         ctl;
+
+               ctl.keysize = sizeof(ConnCacheKey);
+               ctl.entrysize = sizeof(ConnCacheEntry);
+               ConnectionHash = hash_create("postgres_fdw connections", 8,
+                                                                        &ctl,
+                                                                        HASH_ELEM | HASH_BLOBS);

Currently ConnectionHash is created under TopMemoryContext. With the patch, since GetConnectionCacheEntry() can be called in other places, ConnectionHash may be created under the memory context other than TopMemoryContext? If so, that's safe?

hash_create() creates a hash map under TopMemoryContext unless
HASH_CONTEXT is specified. So I think ConnectionHash is still created
in the same memory context.

-               if (PQstatus(entry->conn) != CONNECTION_OK ||
-                       PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-                       entry->changing_xact_state ||
-                       entry->invalidated)
...
+       if (PQstatus(entry->conn) != CONNECTION_OK ||
+               PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+               entry->changing_xact_state)

Why did you get rid of the condition "entry->invalidated"?

My bad. I'll fix it.

If we want to perform some operations at the end of the top
transaction per FDW, not per foreign transaction, we will either still
need to use XactCallback or need to rethink the FDW API design. But
given that we call commit and rollback FDW API for only foreign
servers that actually started a transaction, I’m not sure if there are
such operations in practice. IIUC there is not at least from the
normal (not-sub) transaction termination perspective.

One feature in my mind that may not match with this new API is to perform transaction commits on multiple servers in parallel. That's something like the following. As far as I can recall, another proposed version of 2pc on postgres_fdw patch included that feature. If we want to implement this to increase the performance of transaction commit in the future, I'm afraid that new API will prevent that.

foreach(foreign transactions)
send commit command

foreach(foreign transactions)
wait for reply of commit

What I'm thinking is to pass a flag, say FDWXACT_ASYNC, to
Commit/RollbackForeignTransaction() and add a new API to wait for the
operation to complete, say CompleteForeignTransaction(). If
commit/rollback callback in an FDW is called with FDWXACT_ASYNC flag,
it should send the command and immediately return the handler (e.g.,
PQsocket() in postgres_fdw). The GTM gathers the handlers and poll
events on them. To complete the command, the GTM calls
CompleteForeignTransaction() to wait for the command to complete.
Please refer to XA specification for details (especially xa_complete()
and TMASYNC flag). A pseudo-code is something like the followings:

foreach (foreign transactions)
call CommitForeignTransaction(FDWXACT_ASYNC);
append the returned fd to the array.

while (true)
{
poll event on fds;
call CompleteForeignTransaction() for fd owner;
if (success)
remove fd from the array;

if (array is empty)
break;
}

On second thought, new per-transaction commit/rollback callback is essential when users or the resolver process want to resolve the specifed foreign transaction, but not essential when backends commit/rollback foreign transactions. That is, even if we add per-transaction new API for users and resolver process, backends can still use CallXactCallbacks() when they commit/rollback foreign transactions. Is this understanding right?

I haven’t tried that but I think that's possible if we can know
commit/rollback callback (e.g., postgresCommitForeignTransaction() etc
in postgres_fdw) is called via SQL function (pg_resolve_foreign_xact()
SQL function) or called by the resolver process. That is, we register
foreign transaction via FdwXactRegisterXact(), don’t do nothing in
postgresCommit/RollbackForeignTransaction() if these are called by the
backend, and perform COMMIT/ROLLBACK in pgfdw_xact_callback() in
asynchronous manner. On the other hand, if
postgresCommit/RollbackForeignTransaction() is called via SQL function
or by the resolver these functions commit/rollback the transaction.

Regarding cursor_number, it essentially needs to be unique at least
within a transaction so we can manage it per transaction or per
connection. But the current postgres_fdw rather ensure uniqueness
across all connections. So it seems to me that this can be fixed by
making individual connection have cursor_number and resetting it in
pgfdw_cleanup_after_transaction(). I think this can be in a separate
patch.

Maybe, so let's work on this later, at least after we confirm that
this change is really necessary.

Okay.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#220Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#219)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/01/27 14:08, Masahiko Sawada wrote:

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

You fixed some issues. But maybe you forgot to attach the latest patches?

Yes, I've attached the updated patches.

Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change.

+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.

I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions can have the same combination of OIDs of server and user. For example, the following two SELECT queries start the different foreign transactions but their user OID is the same. OID of user mapping should be used instead of OID of user?

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t(i int);
CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't');
BEGIN;
SELECT * FROM ft;
DROP USER MAPPING FOR postgres SERVER loopback ;
SELECT * FROM ft;
COMMIT;

Good catch. I've considered using user mapping OID or a pair of user
mapping OID and server OID as a key of foreign transactions but I
think it also has a problem if an FDW caches the connection by pair of
server OID and user OID whereas the core identifies them by user
mapping OID. For instance, mysql_fdw manages connections by pair of
server OID and user OID.

For example, let's consider the following execution:

BEGIN;
SET ROLE user_A;
INSERT INTO ft1 VALUES (1);
SET ROLE user_B;
INSERT INTO ft1 VALUES (1);
COMMIT;

Suppose that an FDW identifies the connections by {server OID, user
OID} and the core GTM identifies the transactions by user mapping OID,
and user_A and user_B use the public user mapping to connect server_X.
In the FDW, there are two connections identified by {user_A, sever_X}
and {user_B, server_X} respectively, and therefore opens two
transactions on each connection, while GTM has only one FdwXact entry
because the two connections refer to the same user mapping OID. As a
result, at the end of the transaction, GTM ends only one foreign
transaction, leaving another one.

Using user mapping OID seems natural to me but I'm concerned that
changing role in the middle of transaction is likely to happen than
dropping the public user mapping but not sure. We would need to find
more better way.

After more thought, I'm inclined to think it's better to identify
foreign transactions by user mapping OID. The main reason is, I think
FDWs that manages connection caches by pair of user OID and server OID
potentially has a problem with the scenario Fujii-san mentioned. If an
FDW has to use another user mapping (i.g., connection information) due
to the currently used user mapping being removed, it would have to
disconnect the previous connection because it has to use the same
connection cache. But at that time it doesn't know the transaction
will be committed or aborted.

Also, such FDW has the same problem that postgres_fdw used to have; a
backend establishes multiple connections with the same connection
information if multiple local users use the public user mapping. Even
from the perspective of foreign transaction management, it more makes
sense that foreign transactions correspond to the connections to
foreign servers, not to the local connection information.

I can see that some FDW implementations such as mysql_fdw and
firebird_fdw identify connections by pair of server OID and user OID
but I think this is because they consulted to old postgres_fdw code. I
suspect that there is no use case where FDW needs to identify
connections in that way. If the core GTM identifies them by user
mapping OID, we could enforce those FDWs to change their way but I
think that change would be the right improvement.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#221Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Masahiko Sawada (#220)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Feb 11, 2021 at 6:25 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <masao.fujii@oss.nttdata.com>

wrote:

On 2021/01/27 14:08, Masahiko Sawada wrote:

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

You fixed some issues. But maybe you forgot to attach the latest

patches?

Yes, I've attached the updated patches.

Thanks for updating the patch! I tried to review 0001 and 0002 as the

self-contained change.

+ * An FDW that implements both commit and rollback APIs can request

to register

+ * the foreign transaction by FdwXactRegisterXact() to participate it

to a

+ * group of distributed tranasction. The registered foreign

transactions are

+ * identified by OIDs of server and user.

I'm afraid that the combination of OIDs of server and user is not

unique. IOW, more than one foreign transactions can have the same
combination of OIDs of server and user. For example, the following two
SELECT queries start the different foreign transactions but their user OID
is the same. OID of user mapping should be used instead of OID of user?

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user

'postgres');

CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user

'postgres');

CREATE TABLE t(i int);
CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS

(table_name 't');

BEGIN;
SELECT * FROM ft;
DROP USER MAPPING FOR postgres SERVER loopback ;
SELECT * FROM ft;
COMMIT;

Good catch. I've considered using user mapping OID or a pair of user
mapping OID and server OID as a key of foreign transactions but I
think it also has a problem if an FDW caches the connection by pair of
server OID and user OID whereas the core identifies them by user
mapping OID. For instance, mysql_fdw manages connections by pair of
server OID and user OID.

For example, let's consider the following execution:

BEGIN;
SET ROLE user_A;
INSERT INTO ft1 VALUES (1);
SET ROLE user_B;
INSERT INTO ft1 VALUES (1);
COMMIT;

Suppose that an FDW identifies the connections by {server OID, user
OID} and the core GTM identifies the transactions by user mapping OID,
and user_A and user_B use the public user mapping to connect server_X.
In the FDW, there are two connections identified by {user_A, sever_X}
and {user_B, server_X} respectively, and therefore opens two
transactions on each connection, while GTM has only one FdwXact entry
because the two connections refer to the same user mapping OID. As a
result, at the end of the transaction, GTM ends only one foreign
transaction, leaving another one.

Using user mapping OID seems natural to me but I'm concerned that
changing role in the middle of transaction is likely to happen than
dropping the public user mapping but not sure. We would need to find
more better way.

After more thought, I'm inclined to think it's better to identify
foreign transactions by user mapping OID. The main reason is, I think
FDWs that manages connection caches by pair of user OID and server OID
potentially has a problem with the scenario Fujii-san mentioned. If an
FDW has to use another user mapping (i.g., connection information) due
to the currently used user mapping being removed, it would have to
disconnect the previous connection because it has to use the same
connection cache. But at that time it doesn't know the transaction
will be committed or aborted.

Also, such FDW has the same problem that postgres_fdw used to have; a
backend establishes multiple connections with the same connection
information if multiple local users use the public user mapping. Even
from the perspective of foreign transaction management, it more makes
sense that foreign transactions correspond to the connections to
foreign servers, not to the local connection information.

I can see that some FDW implementations such as mysql_fdw and
firebird_fdw identify connections by pair of server OID and user OID
but I think this is because they consulted to old postgres_fdw code. I
suspect that there is no use case where FDW needs to identify
connections in that way. If the core GTM identifies them by user
mapping OID, we could enforce those FDWs to change their way but I
think that change would be the right improvement.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Regression is failing, can you please take a look.

https://cirrus-ci.com/task/5522445932167168

t/080_pg_isready.pl ....... ok
# Failed test 'parallel reindexdb for system with --concurrently skips
catalogs status (got 1 vs expected 0)'
# at t/090_reindexdb.pl line 191.
Bailout called. Further testing stopped: system pg_ctl failed
FAILED--Further testing stopped: system pg_ctl failed
make[2]: *** [Makefile:57: check] Error 255
make[1]: *** [Makefile:43: check-scripts-recurse] Error 2
make: *** [GNUmakefile:71: check-world-src/bin-recurse] Error 2
=== ./contrib/hstore_plperl/log/initdb.log ===
Running in no-clean mode. Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user
"postgres".
This user must also own the server process.
--

--
Ibrar Ahmed

#222Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Ibrar Ahmed (#221)
10 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, Mar 15, 2021 at 3:55 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

On Thu, Feb 11, 2021 at 6:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/01/27 14:08, Masahiko Sawada wrote:

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

You fixed some issues. But maybe you forgot to attach the latest patches?

Yes, I've attached the updated patches.

Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change.

+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.

I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions can have the same combination of OIDs of server and user. For example, the following two SELECT queries start the different foreign transactions but their user OID is the same. OID of user mapping should be used instead of OID of user?

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t(i int);
CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't');
BEGIN;
SELECT * FROM ft;
DROP USER MAPPING FOR postgres SERVER loopback ;
SELECT * FROM ft;
COMMIT;

Good catch. I've considered using user mapping OID or a pair of user
mapping OID and server OID as a key of foreign transactions but I
think it also has a problem if an FDW caches the connection by pair of
server OID and user OID whereas the core identifies them by user
mapping OID. For instance, mysql_fdw manages connections by pair of
server OID and user OID.

For example, let's consider the following execution:

BEGIN;
SET ROLE user_A;
INSERT INTO ft1 VALUES (1);
SET ROLE user_B;
INSERT INTO ft1 VALUES (1);
COMMIT;

Suppose that an FDW identifies the connections by {server OID, user
OID} and the core GTM identifies the transactions by user mapping OID,
and user_A and user_B use the public user mapping to connect server_X.
In the FDW, there are two connections identified by {user_A, sever_X}
and {user_B, server_X} respectively, and therefore opens two
transactions on each connection, while GTM has only one FdwXact entry
because the two connections refer to the same user mapping OID. As a
result, at the end of the transaction, GTM ends only one foreign
transaction, leaving another one.

Using user mapping OID seems natural to me but I'm concerned that
changing role in the middle of transaction is likely to happen than
dropping the public user mapping but not sure. We would need to find
more better way.

After more thought, I'm inclined to think it's better to identify
foreign transactions by user mapping OID. The main reason is, I think
FDWs that manages connection caches by pair of user OID and server OID
potentially has a problem with the scenario Fujii-san mentioned. If an
FDW has to use another user mapping (i.g., connection information) due
to the currently used user mapping being removed, it would have to
disconnect the previous connection because it has to use the same
connection cache. But at that time it doesn't know the transaction
will be committed or aborted.

Also, such FDW has the same problem that postgres_fdw used to have; a
backend establishes multiple connections with the same connection
information if multiple local users use the public user mapping. Even
from the perspective of foreign transaction management, it more makes
sense that foreign transactions correspond to the connections to
foreign servers, not to the local connection information.

I can see that some FDW implementations such as mysql_fdw and
firebird_fdw identify connections by pair of server OID and user OID
but I think this is because they consulted to old postgres_fdw code. I
suspect that there is no use case where FDW needs to identify
connections in that way. If the core GTM identifies them by user
mapping OID, we could enforce those FDWs to change their way but I
think that change would be the right improvement.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Regression is failing, can you please take a look.

Thank you!

I've attached the updated version patch set.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v35-0010-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v35-0010-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From e46346522c5b3e87181623d9d5bc4ce5dbd37989 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v35 10/10] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 ++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 526 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/022_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1296 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/022_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5391f461a2..d581a2a412 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..ca8a90f3e5
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..c32bea5df2
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use File::Copy qw/copy move/;
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $expected) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$expected = 0 unless defined $expected;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) = $expected FROM pg_foreign_xacts");
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback .* on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..89d67c720f
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,526 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactInfo *finfo);
+static void testCommitForeignTransaction(FdwXactInfo *finfo);
+static void testRollbackForeignTransaction(FdwXactInfo *finfo);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	UserMapping	*usermapping;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	usermapping = GetUserMapping(userid, table->serverid);
+	FdwXactRegisterXact(usermapping, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+
+	if (check_event(finfo->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 finfo->identifier,
+							 finfo->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(finfo->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 finfo->identifier,
+								 finfo->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 finfo->identifier,
+								 finfo->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 96442ceb4e..0e5e05e41a 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/022_fdwxact.pl b/src/test/recovery/t/022_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/022_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b284cc88c4..5ceba8972a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2350,9 +2350,12 @@ regression_main(int argc, char *argv[],
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2367,7 +2370,9 @@ regression_main(int argc, char *argv[],
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 49614106dc..517fc2caad 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -50,7 +50,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.27.0

v35-0008-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/octet-stream; name=v35-0008-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From b604009651b30403d166faa17f86a05d2ea50138 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 2 Nov 2020 14:32:10 +0900
Subject: [PATCH v35 08/10] postgres_fdw marks foreign transaction as modified
 on modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 99ba8b0999..b584e7dcf5 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -61,6 +61,7 @@ typedef struct ConnCacheEntry
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
 	Oid			serverid;		/* foreign server OID used to get server name */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -297,6 +298,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
 	entry->serverid = server->serverid;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -311,6 +313,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -582,7 +598,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user);
+		FdwXactRegisterXact(user, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -591,6 +607,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 13cc3d07d6..6aae080fca 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2506,6 +2506,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3699,6 +3700,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8c72c910c7..fc5a0766f4 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ extern void reset_transmission_modes(int nestlevel);
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
-- 
2.27.0

v35-0007-Prepare-foreign-transactions-at-commit-time.patchapplication/octet-stream; name=v35-0007-Prepare-foreign-transactions-at-commit-time.patchDownload
From 4c22c063930f0dd849114bfa11229f20512786c3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v35 07/10] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c          | 201 ++++++++++++++++--
 src/backend/access/transam/xact.c             |   4 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |   9 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 222 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 5d1fcd80ed..e0bf3bb969 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -20,6 +20,23 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API regardless of data on the foreign server having
  * been modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback
@@ -96,8 +113,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -116,6 +135,10 @@
 #define ServerSupportTwophaseCommit(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /*
  * Name of foreign prepared transaction file is 8 bytes xid and
  * user mapping OID separated by '_'.
@@ -156,6 +179,9 @@ typedef struct FdwXactParticipant
 	 */
 	FdwXact		fdwxact;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -170,18 +196,24 @@ typedef struct FdwXactParticipant
  * PreparedAllParticipants is true if the all foreign transaction participants
  * in FdwXactParticipants are prepared. This can be true in PREPARE TRANSACTION
  * case.
+ *
+ * ForeignTwophaseCommitIsRequired is true if the current transaction needs to
+ * be committed using two-phase commit protocol.
  */
 static HTAB *FdwXactParticipants = NULL;
 static bool PreparedAllParticipants = false;
+static bool ForeignTwophaseCommitIsRequired = false;
 
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static char *getFdwXactIdentifier(FdwXactParticipant *fdw_part, TransactionId xid);
 static FdwXact FdwXactInsertEntry(TransactionId xid, FdwXactParticipant *fdw_part,
 								  char *identifier);
@@ -200,6 +232,7 @@ static char *ProcessFdwXactBuffer(TransactionId xid, Oid umid,
 								  XLogRecPtr insert_start_lsn, bool fromdisk);
 static char *ReadFdwXactFile(TransactionId xid, Oid umid);
 static void RemoveFdwXactFile(TransactionId xid, Oid umid, bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 
 static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid,
 							  Oid owner, char *identifier);
@@ -274,7 +307,7 @@ FdwXactShmemInit(void)
  * mapping OID as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(UserMapping *usermapping)
+FdwXactRegisterXact(UserMapping *usermapping, bool modified)
 {
 	FdwXactParticipant	*fdw_part;
 	FdwRoutine	*routine;
@@ -306,6 +339,7 @@ FdwXactRegisterXact(UserMapping *usermapping)
 
 	key = usermapping->umid;
 	fdw_part = hash_search(FdwXactParticipants, (void *) &key, HASH_ENTER, &found);
+	fdw_part->modified |= modified;
 
 	/* Already registered */
 	if (found)
@@ -368,14 +402,23 @@ RemoveFdwParticipant(FdwXactPartKey key)
 }
 
 /*
- * Pre-commit processing for foreign transactions. We commit those foreign
- * transactions with one-phase.
+ * Pre-commit processing for foreign transactions.
+ *
+ * Prepare all foreign transactions if foreign twophase commit is required.
+ * When foreign twophase commit is enabled, the behavior depends on the value
+ * of foreign_twophase_commit; when 'required' we strictly require for all
+ * foreign servers' FDW to support two-phase commit protocol and ask them to
+ * prepare foreign transactions, and when 'disabled' since we use one-phase
+ * commit these foreign transactions are committed at the transaction end.
+ * If we failed to prepare any of them we change to aborting.
  */
 void
 PreCommit_FdwXact(bool is_parallel_worker)
 {
 	HASH_SEQ_STATUS scan;
 	FdwXactParticipant *fdw_part;
+	TransactionId xid;
+	bool		local_modified;
 
 	/*
 	 * If there is no foreign server involved or all foreign transactions
@@ -386,6 +429,40 @@ PreCommit_FdwXact(bool is_parallel_worker)
 
 	Assert(!RecoveryInProgress());
 
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Check if we need to use foreign twophase commit. Note that we don't
+	 * support foreign twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		ForeignTwophaseCommitIsRequired = true;
+
+		return;
+	}
+
 	/* Commit all foreign transactions in the participant list */
 	hash_seq_init(&scan, FdwXactParticipants);
 	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
@@ -458,6 +535,23 @@ AtEOXact_FdwXact(bool is_commit, bool is_parallel_worker)
 		FdwXactLaunchOrWakeupResolver();
 
 	PreparedAllParticipants = false;
+	ForeignTwophaseCommitIsRequired = false;
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(TransactionId xid, Oid umid)
+{
+	int         idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact_idx(xid, umid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
 }
 
 /*
@@ -497,7 +591,7 @@ AtPrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * Remember we already prepared all participants.  We keep FdwXactParticipants
@@ -507,22 +601,6 @@ AtPrepare_FdwXact(void)
 	PreparedAllParticipants = true;
 }
 
-/*
- * Return true if there is at least one prepared foreign transaction
- * which matches given arguments.
- */
-bool
-FdwXactExists(TransactionId xid, Oid umid)
-{
-	int			idx;
-
-	LWLockAcquire(FdwXactLock, LW_SHARED);
-	idx = get_fdwxact_idx(xid, umid);
-	LWLockRelease(FdwXactLock);
-
-	return (idx >= 0);
-}
-
 /*
  * We must fsync the foreign transaction state file that is valid or generated
  * during redo and has a inserted LSN <= the checkpoint's redo horizon.
@@ -607,11 +685,85 @@ CheckPointFdwXacts(XLogRecPtr redo_horizon)
 							   serialized_fdwxacts)));
 }
 
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return ForeignTwophaseCommitIsRequired;
+}
+
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	FdwXactParticipant *fdw_part;
+	HASH_SEQ_STATUS scan;
+	bool	have_no_twophase = false;
+	int		nserverswritten = 0;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	hash_seq_init(&scan, FdwXactParticipants);
+	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
+	{
+		if (!fdw_part->modified)
+			continue;
+
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			have_no_twophase = true;
+
+		nserverswritten++;
+	}
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performing
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	Assert(foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED);
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	if (have_no_twophase)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	return true;
+}
+
 /*
- * Insert FdwXact entries and prepare foreign transactions.
+ * Insert FdwXact entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	FdwXactParticipant *fdw_part;
 	HASH_SEQ_STATUS scan;
@@ -630,6 +782,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdw_part->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		identifier = getFdwXactIdentifier(fdw_part, xid);
 		Assert(identifier);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cf4d5f2574..c064fc936f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7d2320834b..517ebdf5ca 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -501,6 +501,24 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4713,6 +4731,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3e2c013680..a0fcc053fb 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -747,6 +747,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 9d6d66ac6f..58ba7cf848 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -22,6 +22,14 @@
 											 * without preparation */
 #define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -103,6 +111,7 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern void PreCommit_FdwXact(bool is_parallel_worker);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4bb6f3738b..26e439e0cc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -284,7 +284,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(UserMapping *usermapping);
+extern void FdwXactRegisterXact(UserMapping *usermapping, bool modified);
 extern void FdwXactUnregisterXact(UserMapping *usermapping);
 
 #endif							/* FDWAPI_H */
-- 
2.27.0

v35-0009-Documentation-update.patchapplication/octet-stream; name=v35-0009-Documentation-update.patchDownload
From cc0cb59ef48fc8aee2e963eab19127c778f1a9e3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v35 09/10] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 ++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 254 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 src/backend/access/transam/README.fdwxact | 134 ++++++++++++
 10 files changed, 1022 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/transam/README.fdwxact

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index db29905e91..a5e418b6b2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9312,6 +9312,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11178,6 +11183,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e81141e45c..9fea5c2231 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9372,6 +9372,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..bae3ee0f2a
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global Transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers. The global transaction manager is responsible for
+  managing transactions on foreign servers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign server were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in term of federated database.
+   Atomic commit of distributed transaction is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using the <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all the changes on foreign servers are either committed or rolled back using
+   the transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transaction on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the prepare on all foreign servers is
+       successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits transaction locally.
+       Any failure happens in this step the server changes to rollback, then
+       rollback all transactions on both local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transaction on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns acknowledgement of the successful commit of the
+    distributed transaction to the client after the step 2.  After that, the all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  During that, the transaction might see the different
+    results on the foreign servers on reading.  In case where the local node
+    crashes during preparing transactions, the distributed transaction becomes
+    in-doubt state.  The information of involved foreign transactions is
+    recovered during crash recovery and these are resolved in background.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolves the
+    transactions associated with the in-doubt distributed transaction. Or you can
+    use <function>pg_resolve_foriegn_xact</function> function to resolve it
+    manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 2e73d296d2..930d9bea68 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1504,6 +1504,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>frstate-&gt;flag</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when user requested rollbacking or when
+    any error occurs during the transaction. This function must be tolerate to
+    being called recursively if any error occurs during rollback the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>frstate-&gt;flag</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>frstate-&gt;fdwxact_id</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>frstate-&gt;fdwxact_id</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is failure during preparing the foreign tranasction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates an unique identifier with in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;server oid&gt;_&lt;user oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      at outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -1983,4 +2094,147 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a FDW's server supports transaction, it is usually worthwhile for the
+    FDW to manage transaction opened on the foreign server. The FDW callback
+    function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactInfo</literal> can be used to get
+    information of foreign server being processed such as server name, OID of
+    server, user and user mapping. The <literal>flags</literal> has contains flag
+    bit describing the foreign transaction state for transaction management.
+   </para>
+
+   <sect2 id="fdw-transaction-registration">
+    <title> Foreign Transaction Registration and Unregistration</title>
+    <para>
+     Foreign transaction needs to be registered to
+     <productname>PostgreSQL</productname> global transaction manager.
+     Registration and unregistration are done by calling
+     <function>FdwXactRegisterXact</function> and
+     <function>FdwXactUnregisterXact</function> respectively.
+     The FDW can pass a boolean <literal>modified</literal> along with
+     OIDs of server and user to <function>FdwXactRegisterXact</function>
+     indicating writes are going to happen on the foreign server.  Such foreign
+     servers are taken into account for the decision of two-phase commit
+     protocol being required or not.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-commit-rollback">
+    <title>Commit and Rollback Single Foreign Transaction</title>
+    <para>
+     The FDW callback function <function>CommitForeignTransaction</function>
+     and <function>RollbackForeignTransaction</function> can be used to commit
+     and rollback the foreign transaction. During transaction commit, the core
+     transaction manager calls <function>CommitForeignTransaction</function> function
+     in the pre-commit phase and calls
+     <function>RollbackForeignTransaction</function> function in the post-rollback
+     phase.
+    </para>
+   </sect2>
+
+   <sect2 id="fdw-transaction-distributed-transaction-commit">
+    <title>Atomic Commit and Rollback Distributed Transaction</title>
+    <para>
+     In addition to simply commit and rollback foreign transactions described at
+     <xref linkend="fdw-transaction-commit-rollback"/>,
+     <productname>PostgreSQL</productname> global transaction manager enables
+     distributed transactions to atomically commit and rollback among all foreign
+     servers, which is as known as atomic commit in literature. To achieve atomic
+     commit, <productname>PostgreSQL</productname> employs two-phase commit
+     protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+     to support two-phase commit protocol are required to have the FDW callback
+     function <function>PrepareForeignTransaction</function> and optionally
+     <function>GetPrepareId</function>, in addition to
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>
+     (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+    </para>
+
+    <para>
+     An example of distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+    </para>
+
+    <para>
+     When the core executor access the foreign servers, foreign servers whose FDW
+     supports transaction management callback routines is registered as a participant.
+     During registration, <function>GetPrepareId</function> is called if provided to
+     generate an unique transaction identifer.
+    </para>
+
+    <para>
+     During pre-commit phase of local transaction, the foreign transaction manager
+     persists the foreign transaction information to the disk and WAL, and then
+     prepare all foreign transaction by calling
+     <function>PrepareForeignTransaction</function> if two-phase commit protocol
+     is required. Two-phase commit is required when the transaction modified data
+     on more than one servers including the local server itself and user requests
+     foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+    </para>
+
+    <para>
+     <productname>PostgreSQL</productname> can commit locally and go to the next
+     step if and only if all foreign transactions are prepared successfully.
+     If any failure happens or user requests to cancel during preparation,
+     the distributed transaction manager changes over rollback and calls
+     <function>RollbackForeignTransaction</function>.
+    </para>
+
+    <para>
+     When changing over rollback due to any failure, it calls
+     <function>RollbackForeignTransaction</function> with
+     <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+     closed yet, and calls <function>RollbackForeignTransaction</function> without
+     that flag for foreign transactions which are already prepared.  For foreign
+     transactions which are being prepared, it does both because it's not sure that
+     the preeparation has been completed on the foreign server. Therefore,
+     <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+     object error.
+    </para>
+
+    <para>
+     Note that when <literal>(frstate-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+     is true, both <literal>CommitForeignTransaction</literal> function and
+     <literal>RollbackForeignTransaction</literal> function should commit and
+     rollback directly, rather than processing prepared transactions. This can
+     happen when two-phase commit is not required or foreign server is not
+     modified with in the transaction.
+    </para>
+
+    <para>
+     Once all foreign transaction is prepared, the core transaction manager commits
+     locally. After that the transaction commit waits for all prepared foreign
+     transaction to be committed before completetion. After all prepared foreign
+     transactions are resolved the transaction commit completes.
+    </para>
+
+    <para>
+     One foreign transaction resolver process is responsible for foreign
+     transaction resolution on a database. Foreign transaction resolver process
+     calls either <function>CommitForeignTransaction</function> or
+     <function>RollbackForeignTransaction</function> to resolve foreign
+     transaction identified by <literal>frstate-&gt;fdwxact_id</literal>. If failed
+     to resolve, resolver process will exit with an error message. The foreign
+     transaction launcher will launch the resolver process again at
+     <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db1d369743..a2c0377e75 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab31a9056..2de4c8384d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26970,6 +26970,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..5418c6ef88 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1072,6 +1072,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1301,6 +1313,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1594,6 +1618,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1907,6 +1936,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 730d5fdc34..a5c5619072 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 3234adb639..83f30c5045 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
diff --git a/src/backend/access/transam/README.fdwxact b/src/backend/access/transam/README.fdwxact
new file mode 100644
index 0000000000..8da9030689
--- /dev/null
+++ b/src/backend/access/transam/README.fdwxact
@@ -0,0 +1,134 @@
+src/backend/access/transam/README.fdwxact
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactParticipant, which is maintained by PostgreSQL's the global
+transaction manager (GTM), as a distributed transaction participant The
+registered foreign transactions are tracked until the end of transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions are resolved by
+the resolver process asynchronusly or can be resolved using by
+pg_resolve_foreign_xact() manually, and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process.
+
+
+Identifying Foreign Transactions In GTM
+---------------------------------------
+
+To identify foreign transaction participants (as well as FdwXact entries) there
+are two ways: using {server OID, user OID} and using user mapping OID. The same
+is true for FDWs to identify the connections (and transactions upon) to the
+foreign server. We need to consider the case where the way to identify the
+transactions is not matched between GTM and FDWs, because the problem might occur
+when the user modifies the same foreign server by different roles within the
+transaction. For example, consider the following execution:
+
+BEGIN;
+SET ROLE user_A;
+INSERT INTO ft1 VALUES (1);
+SET ROLE user_B;
+INSERT INTO ft1 VALUES (1);
+COMMIT;
+
+For example, suppose that an FDW identifies the connection by {server OID, user OID}
+and GTM identifies the transactions by user mapping OID, and user_A and user_B use
+the public user mapping to connect server_X. In the FDW, there are two
+connections: {user_A, sever_X} and {user_B, server_X}, and therefore opens two
+transactions on each connection, while GTM has only one FdwXact entry because the two
+connections refer to the same user mapping OID. As a result, at the end of the
+transaction, GTM ends only one foreign transaction, leaving another one.
+
+On the other hand, suppose that an FDW identifies the connection by user mapping OID
+and GTM does that by {server OID, user OID}, the FDW uses only one connection and opens
+a transaction since both users refer to the same user mapping OID (we expect FDWs
+not to register the foreign transaction when not starting a new transaction on the
+foreign server). Since GTM also has one entry it can end the foreign transaciton
+properly. The downside would be that the user OID of FdwXact (i.g., FdwXact->userid)
+is the user who registered the foreign transaction for the first time, necessarily
+not the user who executed COMMIT.  For example in the above case, FdwXact->userid
+will be user_A, not user_B. But it’s not big problem in practice.
+
+Therefore, in fdwxact.c, we identify the foreign transaction by
+{server OID, user OID}.
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transaction has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared. And the status changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before committing and
+aborting respectively. FdwXact entry is removed with WAL logging after resolved.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status for those entries is FDWXACT_STATUS_PREPARED if they are recovered
+from WAL. Because we WAL logs only when preparing the foreign transaction we
+cannot know the exact fate of the foreign transaction from the recovery.
+
+The foreign transaction status transition is illustrated by the following
+graph describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+   (*1)         |      PREPARING      |           (*1)
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |     COMMITTING     |          |      ABORTING      |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                        END                         |
+ +----------------------------------------------------+
+
+(*1) Paths for recovered FdwXact entries
-- 
2.27.0

v35-0005-Add-GetPrepareId-API.patchapplication/octet-stream; name=v35-0005-Add-GetPrepareId-API.patchDownload
From a83630b966029b7a6ef7e63820fb7e4c08475e33 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v35 05/10] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c | 52 ++++++++++++++++++++++++----
 src/include/foreign/fdwapi.h         |  4 +++
 2 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 60d726af62..c8a75bcfbc 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -157,6 +157,7 @@ typedef struct FdwXactParticipant
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactParticipant;
 
 /*
@@ -328,6 +329,7 @@ FdwXactRegisterXact(UserMapping *usermapping)
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdw_part->get_prepareid_fn = routine->GetPrepareId;
 
 	MemoryContextSwitchTo(old_ctx);
 	pfree(routine);
@@ -662,9 +664,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<umid> whose length is less than FDWXACT_ID_MAX_LEN.
+ * Return a null-terminated foreign transaction identifier.  If the given FDW
+ * supports getPrepareId callback we return the identifier returned from it.
+ * Otherwise we generate an unique identifier with in the form of
+ * "fx_<random number>_<xid>_<umid>" whose length is less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
  * identifier should not be same as any other concurrent prepared transaction
@@ -678,12 +681,47 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 static char *
 getFdwXactIdentifier(FdwXactParticipant *fdw_part, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
-			 xid, fdw_part->key);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdw_part->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
+				 xid, fdw_part->key);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid,
+									fdw_part->usermapping->userid,
+									&id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index e6e294e3ec..4bb6f3738b 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -179,9 +179,12 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+
 typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -266,6 +269,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.27.0

v35-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v35-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 24682d0c2f8649c3e8ded28b6fc265413efdf4b0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 29 Aug 2020 00:14:36 +0900
Subject: [PATCH v35 02/10] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 465 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 235 insertions(+), 239 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index ee0b4acf0b..42166f092a 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
 #include "funcapi.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -90,8 +91,7 @@ static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
 static void do_sql_command(PGconn *conn, const char *sql);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -104,6 +104,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 static bool disconnect_cached_connections(Oid serverid);
 
 /*
@@ -119,53 +121,14 @@ static bool disconnect_cached_connections(Oid serverid);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
 	/* Set flag that we did GetConnection during the current transaction */
 	xact_got_connection = true;
 
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -197,7 +160,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	PG_TRY();
 	{
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -258,7 +221,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -267,6 +230,53 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	return entry->conn;
 }
 
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -557,7 +567,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -569,6 +579,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -784,199 +797,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state or it is marked as
-		 * invalid, then discard it to recover. Next GetConnection will open a
-		 * new connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state ||
-			entry->invalidated)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1599,3 +1419,172 @@ disconnect_cached_connections(Oid serverid)
 
 	return result;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	do_sql_command(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	/*
+	 * In simple rollback case, we must have a connection to the foreign server
+	 * because the foreign transaction is not closed yet. We get the connection
+	 * entry from the cache.
+	 */
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+	{
+		pgfdw_cleanup_after_transaction(entry);
+		return;
+	}
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+
+	return;
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state ||
+		entry->invalidated)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+   xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..42a77e6725 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9019,7 +9019,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..6ce6b0b5ba 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -583,6 +583,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d9fd..c44d37f280 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -137,6 +138,8 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query);
 extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
+extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.27.0

v35-0004-postgres_fdw-supports-prepare-API.patchapplication/octet-stream; name=v35-0004-postgres_fdw-supports-prepare-API.patchDownload
From 333d6198590d9ef7e787b5b2ca8bb4a49bc8755e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v35 04/10] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 136 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 134 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 42166f092a..99ba8b0999 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -105,6 +105,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 static bool disconnect_cached_connections(Oid serverid);
 
@@ -1424,12 +1426,19 @@ void
 postgresCommitForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping, finfo->identifier,
+								true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1471,16 +1480,24 @@ void
 postgresRollbackForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	/*
 	 * In simple rollback case, we must have a connection to the foreign server
 	 * because the foreign transaction is not closed yet. We get the connection
 	 * entry from the cache.
 	 */
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping, finfo->identifier,
+								false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1557,6 +1574,46 @@ postgresRollbackForeignTransaction(FdwXactInfo *finfo)
 	return;
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", finfo->identifier);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   finfo->server->servername, finfo->identifier)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 finfo->server->servername, finfo->identifier);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1588,3 +1645,74 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback", fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 026427f004..69dd0eba68 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9009,19 +9009,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 6ce6b0b5ba..13cc3d07d6 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -586,6 +586,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c44d37f280..8c72c910c7 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -140,6 +140,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
 extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
+extern void postgresPrepareForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..833813ec66 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2670,13 +2670,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.27.0

v35-0006-Introduce-foreign-transaction-launcher-and-resol.patchapplication/octet-stream; name=v35-0006-Introduce-foreign-transaction-launcher-and-resol.patchDownload
From e95ef54c970503bcdf59f35d0eb0eb02a432495d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:09:41 +0900
Subject: [PATCH v35 06/10] Introduce foreign transaction launcher and resolver
 processes.

This commits introduces new background processes: foreign
transaction launcher and resolvers. With this change, users no longer
need to use pg_resolve_foreign_xact() to resolve foreign transaction
prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK
TRANSACTION. These foreign transactions are resolved in background by
foreign transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/Makefile           |   2 +
 src/backend/access/transam/fdwxact.c          |  41 +-
 src/backend/access/transam/fdwxact_launcher.c | 558 ++++++++++++++++++
 src/backend/access/transam/fdwxact_resolver.c | 337 +++++++++++
 src/backend/access/transam/twophase.c         |  16 +
 src/backend/postmaster/bgworker.c             |   8 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  13 +-
 src/backend/storage/ipc/ipci.c                |   3 +
 src/backend/storage/lmgr/lwlocknames.txt      |   1 +
 src/backend/tcop/postgres.c                   |  14 +
 src/backend/utils/misc/guc.c                  |  37 ++
 src/backend/utils/misc/postgresql.conf.sample |  12 +
 src/include/access/fdwxact.h                  |   5 +
 src/include/access/fdwxact_launcher.h         |  28 +
 src/include/access/fdwxact_resolver.h         |  23 +
 src/include/access/resolver_internal.h        |  61 ++
 src/include/catalog/pg_proc.dat               |   5 +
 src/include/pgstat.h                          |   2 +
 src/include/utils/guc_tables.h                |   2 +
 20 files changed, 1162 insertions(+), 12 deletions(-)
 create mode 100644 src/backend/access/transam/fdwxact_launcher.c
 create mode 100644 src/backend/access/transam/fdwxact_resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index b05a88549d..26a5ee589c 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -16,6 +16,8 @@ OBJS = \
 	clog.o \
 	commit_ts.o \
 	fdwxact.o \
+	fdwxact_launcher.o \
+	fdwxact_resolver.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index c8a75bcfbc..5d1fcd80ed 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -24,9 +24,9 @@
  * PrepareForeignTransaction() API regardless of data on the foreign server having
  * been modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback
  * only the local transaction but not do anything for involved foreign transactions.
- * To resolve these foreign transactions the user needs to use pg_resolve_foreign_xact()
- * SQL function that resolve a foreign transaction according to the result of the
- * corresponding local transaction.
+ * The prepared foreign transactinos are resolved by a resolver process asynchronously.
+ * Also, users can use pg_resolve_foreign_xact() SQL function that resolve a foreign
+ * transaction manually.
  *
  * LOCKING
  *
@@ -77,9 +77,12 @@
 #include <unistd.h>
 
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/twophase.h"
+#include "access/resolver_internal.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -176,13 +179,14 @@ static bool fdwXactExitRegistered = false;
 
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
 static void FdwXactPrepareForeignTransactions(TransactionId xid);
 static char *getFdwXactIdentifier(FdwXactParticipant *fdw_part, TransactionId xid);
 static FdwXact FdwXactInsertEntry(TransactionId xid, FdwXactParticipant *fdw_part,
 								  char *identifier);
 static void AtProcExit_FdwXact(int code, Datum arg);
-static void ForgetAllFdwXactParticipants(void);
+static int ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool is_commit, bool is_parallel_worker);
 static void RemoveFdwParticipant(FdwXactPartKey umid);
@@ -450,7 +454,9 @@ AtEOXact_FdwXact(bool is_commit, bool is_parallel_worker)
 		}
 	}
 
-	ForgetAllFdwXactParticipants();
+	if (ForgetAllFdwXactParticipants() > 0)
+		FdwXactLaunchOrWakeupResolver();
+
 	PreparedAllParticipants = false;
 }
 
@@ -935,7 +941,8 @@ remove_fdwxact(FdwXact fdwxact)
 static void
 AtProcExit_FdwXact(int code, Datum arg)
 {
-	ForgetAllFdwXactParticipants();
+	if (ForgetAllFdwXactParticipants() > 0)
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -944,7 +951,7 @@ AtProcExit_FdwXact(int code, Datum arg)
  * transaction to prevent the local transaction id of such unresolved foreign
  * transaction from begin truncated.
  */
-static void
+static int
 ForgetAllFdwXactParticipants(void)
 {
 	FdwXactParticipant *fdw_part;
@@ -952,7 +959,7 @@ ForgetAllFdwXactParticipants(void)
 	int	nremaining = 0;
 
 	if (!HasFdwXactParticipant())
-		return;
+		return nremaining;
 
 	hash_seq_init(&scan, FdwXactParticipants);
 	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
@@ -984,6 +991,7 @@ ForgetAllFdwXactParticipants(void)
 	}
 
 	Assert(!HasFdwXactParticipant());
+	return nremaining;
 }
 
 /*
@@ -1017,6 +1025,23 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool is_commit,
 	}
 }
 
+/*
+ * Resolve foreign transactions at the give indexes.
+ *
+ * The caller must hold the given foreign transactions in advance to prevent
+ * concurrent update.
+ */
+void
+ResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
+{
+	for (int i = 0; i < nfdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[fdwxact_idxs[i]];
+
+		CHECK_FOR_INTERRUPTS();
+		ResolveOneFdwXact(fdwxact);
+	}
+}
 
 /* Commit or rollback one prepared foreign transaction */
 static void
diff --git a/src/backend/access/transam/fdwxact_launcher.c b/src/backend/access/transam/fdwxact_launcher.c
new file mode 100644
index 0000000000..35504f5bdf
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_launcher.c
@@ -0,0 +1,558 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "nodes/pg_list.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+static volatile sig_atomic_t got_SIGUSR2 = false;
+
+static void FdwXactLauncherOnExit(int code, Datum arg);
+static void FdwXactLaunchResolver(Oid dbid);
+static bool FdwXactRelaunchResolvers(void);
+
+/* Signal handler */
+static void FdwXactLaunchHandler(SIGNAL_ARGS);
+
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactRequestToLaunchResolver(void)
+{
+	if (FdwXactResolverCtl->launcher_pid != InvalidPid)
+		kill(FdwXactResolverCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactLauncherShmemInit */
+Size
+FdwXactLauncherShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactResolverCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactLauncherShmemInit(void)
+{
+	bool		found;
+
+	FdwXactResolverCtl = ShmemInitStruct("Foreign Transaction Launcher Data",
+										 FdwXactLauncherShmemSize(),
+										 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactResolverCtl, 0, FdwXactLauncherShmemSize());
+		SHMQueueInit(&(FdwXactResolverCtl->fdwxact_queue));
+		FdwXactResolverCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+FdwXactLauncherOnExit(int code, Datum arg)
+{
+	FdwXactResolverCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+FdwXactLaunchHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(FdwXactLauncherOnExit, (Datum) 0);
+
+	Assert(FdwXactResolverCtl->launcher_pid == InvalidPid);
+	FdwXactResolverCtl->launcher_pid = MyProcPid;
+	FdwXactResolverCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGUSR2, FdwXactLaunchHandler);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = FdwXactRelaunchResolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactRequestToLaunchResolver();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+FdwXactLaunchResolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactResolverCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+FdwXactRelaunchResolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->data.xid))
+			hash_search(fdwxact_dbs, &(fdwxact->data.dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			FdwXactLaunchResolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactResolverCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactResolverCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/transam/fdwxact_resolver.c b/src/backend/access/transam/fdwxact_resolver.c
new file mode 100644
index 0000000000..23faaf4602
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_resolver.c
@@ -0,0 +1,337 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves foreign
+ * transactions that participate to a distributed transaction. A resolver
+ * process is started by foreign transaction launcher for each databases.
+ *
+ * A resolver process continues to resolve foreign transactions on the
+ * database, which the backend process is waiting for resolution.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactResolverCtlData *FdwXactResolverCtl;
+
+static void FdwXactResolverLoop(void);
+static long FdwXactResolverComputeSleepTime(TimestampTz now,
+											TimestampTz targetTime);
+static void FdwXactResolverCheckTimeout(TimestampTz now);
+
+static void FdwXactResolverOnExit(int code, Datum arg);
+static void FdwXactResolverDetach(void);
+static void FdwXactResolverAttach(int slot);
+static void HoldInDoubtFdwXacts(void);
+
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * held_fdwxacts has indexes of FdwXact which the resolver marked
+ * as in-processing. These mark is cleared on process exit.
+ */
+static int *held_fdwxacts = NULL;
+static int	nheld;
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+FdwXactResolverDetach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info.
+ */
+static void
+FdwXactResolverOnExit(int code, Datum arg)
+{
+	FdwXactResolverDetach();
+
+	/* Release the held foreign transaction entries */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < nheld; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[held_fdwxacts[i]];
+		fdwxact->locking_backend = InvalidBackendId;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+FdwXactResolverAttach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactResolverCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(FdwXactResolverOnExit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	FdwXactResolverAttach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	held_fdwxacts = palloc(sizeof(int) * max_prepared_foreign_xacts);
+	nheld = 0;
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactResolverLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactResolverLoop(void)
+{
+	MemoryContext resolver_ctx;
+
+	resolver_ctx = AllocSetContextCreate(TopMemoryContext,
+										 "Foreign Transaction Resolver",
+										 ALLOCSET_DEFAULT_SIZES);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz resolutionTs = -1;
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		MemoryContextSwitchTo(resolver_ctx);
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		now = GetCurrentTimestamp();
+
+		/* Hold in-doubt foreign transaction to resolve */
+		HoldInDoubtFdwXacts();
+
+		if (nheld > 0)
+		{
+			/* Resolve in-doubt transactions */
+			StartTransactionCommand();
+			ResolveFdwXacts(held_fdwxacts, nheld);
+			CommitTransactionCommand();
+			last_resolution_time = now;
+		}
+
+		FdwXactResolverCheckTimeout(now);
+
+		sleep_time = FdwXactResolverComputeSleepTime(now, resolutionTs);
+
+		MemoryContextResetAndDeleteChildren(resolver_ctx);
+		MemoryContextSwitchTo(TopMemoryContext);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactResolverCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	FdwXactResolverDetach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out or the next resolution time given by nextResolutionTs.
+ */
+static long
+FdwXactResolverComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		TimestampDifference(now, timeout,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	if (nextResolutionTs > 0)
+	{
+		long		sec_to_timeout;
+		int			microsec_to_timeout;
+
+		TimestampDifference(now, nextResolutionTs,
+							&sec_to_timeout, &microsec_to_timeout);
+
+		sleeptime = Min(sleeptime,
+						sec_to_timeout * 1000 + microsec_to_timeout / 1000);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Lock foreign transactions that are not held by anyone.
+ */
+static void
+HoldInDoubtFdwXacts(void)
+{
+	nheld = 0;
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->data.xid))
+		{
+			held_fdwxacts[nheld++] = i;
+			fdwxact->locking_backend = MyBackendId;
+		}
+	}
+	LWLockRelease(FdwXactLock);
+}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 81facfb09c..4d69303cd2 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -2293,6 +2295,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExists(xid, InvalidOid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2352,6 +2361,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExists(xid, InvalidOid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 6fdea3fc2d..5ac6cc9497 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 701ccb3a03..dd3397f998 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3910,6 +3910,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
 		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
 			event_name = "LogicalApplyMain";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9568dafbe2..4c3ee05f8f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -910,6 +911,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -975,12 +979,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 6f14a950bf..5559080f5f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -17,6 +17,7 @@
 #include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -151,6 +152,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
 		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactLauncherShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 	FdwXactShmemInit();
+	FdwXactLauncherShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 4124321640..a297c746cd 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -54,3 +54,4 @@ XactTruncationLock					44
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
 FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index bb5ccb4578..922eedeaae 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3100,6 +3102,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4d97316fb1..7d2320834b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -763,6 +763,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS */
 	gettext_noop("Version and Platform Compatibility"),
 	/* COMPAT_OPTIONS_PREVIOUS */
@@ -2481,6 +2485,39 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 169d8e0d87..3e2c013680 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -736,6 +736,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 61566aef5a..9d6d66ac6f 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -100,6 +100,9 @@ typedef struct FdwXactInfo
 
 /* GUC parameters */
 extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
 
 /* Function declarations */
 extern void PreCommit_FdwXact(bool is_parallel_worker);
@@ -108,6 +111,8 @@ extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtPrepare_FdwXact(void);
 extern bool FdwXactExists(TransactionId xid, Oid umid);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void ResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts);
 extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
 extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
 								int len);
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..191823f53f
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactRequestToLaunchResolver(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactLauncherShmemSize(void);
+extern void FdwXactLauncherShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..e69c567967
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..42f17120b0
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactResolverCtlData struct for the whole database cluster */
+typedef struct FdwXactResolverCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactResolverCtlData;
+#define SizeOfFdwXactResolverCtlData \
+	(offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactResolverCtlData *FdwXactResolverCtl;
+extern FdwXactResolver *MyFdwXactResolver;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index eef837baa0..4ba43d95c5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6213,6 +6213,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b50daaeb79..47972e24fc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,6 +917,8 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
 	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index b9b5c1adda..94e593ac77 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -96,6 +96,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
-- 
2.27.0

v35-0003-Add-PrepareForeignTransaction-API.patchapplication/octet-stream; name=v35-0003-Add-PrepareForeignTransaction-API.patchDownload
From f2e3df1f82adeac2340cdbbf8863977239bb1fb1 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 15 Mar 2021 12:05:53 +0900
Subject: [PATCH v35 03/10] Add PrepareForeignTransaction API.

This commits add a new FDW API, PrepareForeignTransaction. Using this
API, the transactions initiated on the foreign server are preapred at
PREPARE TRANSACTION time.  The information of prepared foreign
transactions involved with the distributed transaction is crash-safe.
However these functions are neither committed nor aborted at
COMMIT/ROLLBACK PREPARED time.  To resolve these transactions, this
commit also adds pg_resolve_foreign_xact() SQL function.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   59 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/fdwxact.c          | 1734 ++++++++++++++++-
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   28 +
 src/backend/access/transam/xact.c             |    4 +-
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/dependency.c              |    5 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   34 +-
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/pgstat.c               |    9 +
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    3 +
 src/backend/storage/ipc/procarray.c           |   39 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    1 +
 src/backend/utils/misc/guc.c                  |   11 +
 src/backend/utils/misc/postgresql.conf.sample |    2 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   84 +
 src/include/access/fdwxact_xlog.h             |   50 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   18 +
 src/include/commands/defrem.h                 |    1 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/pgstat.h                          |    3 +
 src/include/storage/procarray.h               |    1 +
 src/test/regress/expected/rules.out           |    7 +
 36 files changed, 2132 insertions(+), 34 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/include/access/fdwxact_xlog.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 42a77e6725..026427f004 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9019,7 +9019,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..d7012dab80
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		PostgreSQL global transaction manager for foreign server.
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec;
+
+		appendStringInfo(buf, "xid: %u, dbid: %u, umid: %u, serverid: %u, owner: %u, identifier: %s",
+						 fdwxact_insert->xid,
+						 fdwxact_insert->dbid,
+						 fdwxact_insert->umid,
+						 fdwxact_insert->serverid,
+						 fdwxact_insert->owner,
+						 fdwxact_insert->identifier);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "xid: %u, umid: %u",
+						 fdwxact_remove->xid,
+						 fdwxact_remove->umid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..e4ae79e599 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 7da90eae13..60d726af62 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -13,6 +13,57 @@
  * transaction manager calls corresponding FDW API to end the foreign
  * tranasctions.
  *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol(ACP). Two-phase commit protocol is crash-safe.  We WAL logs the foreign
+ * transaction information.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API regardless of data on the foreign server having
+ * been modified.  At COMMIT PREPARED and ROLLBACK PREPARED, we commit or rollback
+ * only the local transaction but not do anything for involved foreign transactions.
+ * To resolve these foreign transactions the user needs to use pg_resolve_foreign_xact()
+ * SQL function that resolve a foreign transaction according to the result of the
+ * corresponding local transaction.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is updated. To avoid holding the lock during transaction processing
+ * which may take an unpredictable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * A process who is going to work on the foreign transaction needs to set
+ *	 locking_backend of the FdwXact entry, which prevents the entry from being
+ *	 updated and removed by concurrent processes.
+ * * All FdwXact fields except for status are protected by FdwXactLock.  The
+ *   status is protected by its mutex.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXact file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->fdwxacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
  * Portions Copyright (c) 2021, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -21,18 +72,59 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_user_mapping.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/syscache.h"
 
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
+/* Initial size of the hash table */
+#define FDWXACT_HASH_SIZE	64
+
 /* Check the FdwXactParticipant is capable of two-phase commit  */
 #define ServerSupportTransactionCallback(fdw_part) \
 	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+#define ServerSupportTwophaseCommit(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes xid and
+ * user mapping OID separated by '_'.
+ *
+ * Since FdwXact is identified by user mapping OID and it's unique
+ * within a distributed transaction, the name is fairly enough to
+ * ensure uniquness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8)
+#define FdwXactFilePath(path, xid, umid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X", \
+			 xid, umid)
 
 /* Check the current transaction has at least one fdwxact participant */
 #define HasFdwXactParticipant() \
@@ -55,23 +147,122 @@ typedef struct FdwXactParticipant
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/*
+	 * Pointer to a FdwXact entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXact		fdwxact;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactParticipant;
 
 /*
  * Foreign transactions involved in the transaction.  A member of
  * participants must support both commit and rollback APIs.
+ *
+ * PreparedAllParticipants is true if the all foreign transaction participants
+ * in FdwXactParticipants are prepared. This can be true in PREPARE TRANSACTION
+ * case.
  */
 static HTAB *FdwXactParticipants = NULL;
+static bool PreparedAllParticipants = false;
 
-/* Initial size of the hash table */
-#define FDWXACT_HASH_SIZE	64
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
 
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static char *getFdwXactIdentifier(FdwXactParticipant *fdw_part, TransactionId xid);
+static FdwXact FdwXactInsertEntry(TransactionId xid, FdwXactParticipant *fdw_part,
+								  char *identifier);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void ForgetAllFdwXactParticipants(void);
 static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
 											 bool is_commit, bool is_parallel_worker);
 static void RemoveFdwParticipant(FdwXactPartKey umid);
+static void ResolveOneFdwXact(FdwXact fdwxact);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(TransactionId xid, Oid umid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(TransactionId xid, Oid umid,
+								  XLogRecPtr insert_start_lsn, bool fromdisk);
+static char *ReadFdwXactFile(TransactionId xid, Oid umid);
+static void RemoveFdwXactFile(TransactionId xid, Oid umid, bool giveWarning);
+
+static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid,
+							  Oid owner, char *identifier);
+static void remove_fdwxact(FdwXact fdwxact);
+static int get_fdwxact_idx(TransactionId xid, Oid umid);
+static FdwXact get_fdwxact_with_check(TransactionId xid, Oid umid);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, fdwxacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXact)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXact		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_fdwxacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXact)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) +
+					  sizeof(FdwXact) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given user
@@ -101,6 +292,13 @@ FdwXactRegisterXact(UserMapping *usermapping)
 										  &ctl, HASH_ELEM | HASH_BLOBS);
 	}
 
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
 	key = usermapping->umid;
 	fdw_part = hash_search(FdwXactParticipants, (void *) &key, HASH_ENTER, &found);
 
@@ -126,8 +324,10 @@ FdwXactRegisterXact(UserMapping *usermapping)
 		ereport(ERROR,
 				(errmsg("cannot register foreign server not supporting transaction callback")));
 
+	fdw_part->fdwxact = NULL;
 	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	MemoryContextSwitchTo(old_ctx);
 	pfree(routine);
@@ -161,31 +361,590 @@ RemoveFdwParticipant(FdwXactPartKey key)
 	}
 }
 
+/*
+ * Pre-commit processing for foreign transactions. We commit those foreign
+ * transactions with one-phase.
+ */
+void
+PreCommit_FdwXact(bool is_parallel_worker)
+{
+	HASH_SEQ_STATUS scan;
+	FdwXactParticipant *fdw_part;
+
+	/*
+	 * If there is no foreign server involved or all foreign transactions
+	 * are already prepared (see AtPrepare_FdwXact()), we have no business here.
+	 */
+	if (!HasFdwXactParticipant() || PreparedAllParticipants)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit all foreign transactions in the participant list */
+	hash_seq_init(&scan, FdwXactParticipants);
+	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
+	{
+		Assert(ServerSupportTransactionCallback(fdw_part));
+
+		/*
+		 * Commit the foreign transaction and remove itself from the hash table
+		 * so that we don't try to abort already-closed transaction.
+		 */
+		FdwXactParticipantEndTransaction(fdw_part, true, is_parallel_worker);
+		RemoveFdwParticipant(fdw_part->key);
+	}
+}
+
 /*
  * Commit or rollback all foreign transactions.
  */
 void
 AtEOXact_FdwXact(bool is_commit, bool is_parallel_worker)
+{
+	/* If there are no foreign servers involved, we have no business here */
+	if (!HasFdwXactParticipant())
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	if (!is_commit)
+	{
+		HASH_SEQ_STATUS scan;
+		FdwXactParticipant *fdw_part;
+
+		/* Rollback foreign transactions in the participant list */
+		hash_seq_init(&scan, FdwXactParticipants);
+		while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
+		{
+			FdwXact	fdwxact = fdw_part->fdwxact;
+			int	status;
+
+			/*
+			 * If this foreign transaction is not prepared yet, end the foreign
+			 * transaction in one-phase.
+			 */
+			if (!fdwxact)
+			{
+				Assert(ServerSupportTransactionCallback(fdw_part));
+				FdwXactParticipantEndTransaction(fdw_part, false, is_parallel_worker);
+				RemoveFdwParticipant(fdw_part->key);
+				continue;
+			}
+
+			/*
+			 * If the foreign transaction has FdwXact entry, the foreign transaction
+			 * might have been prepared.  We rollback the foreign transaction anyway
+			 * to end the current transaction if the preparation is still in-progress.
+			 * Since the transaction might have been already prepared on the foreign
+			 * we set the status to aborting and leave it.
+			 */
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				FdwXactParticipantEndTransaction(fdw_part, is_commit, is_parallel_worker);
+		}
+	}
+
+	ForgetAllFdwXactParticipants();
+	PreparedAllParticipants = false;
+}
+
+/*
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * In case where an error happens during parparing a foreign transaction we
+ * change to rollback.  See AtEOXact_FdwXact() for details.
+ */
+void
+AtPrepare_FdwXact(void)
 {
 	FdwXactParticipant *fdw_part;
 	HASH_SEQ_STATUS scan;
+	TransactionId xid;
 
 	/* If there are no foreign servers involved, we have no business here */
 	if (!HasFdwXactParticipant())
 		return;
 
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
 	hash_seq_init(&scan, FdwXactParticipants);
 	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
 	{
-		Assert(ServerSupportTransactionCallback(fdw_part));
+		if (!ServerSupportTwophaseCommit(fdw_part))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot PREPARE a distributed transaction which has operated on a foreign server not supporting two-phase commit protocol")));
+	}
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * Remember we already prepared all participants.  We keep FdwXactParticipants
+	 * until the transaction end so that we change the involved foreign transactions
+	 * to abort in case of failure.
+	 */
+	PreparedAllParticipants = true;
+}
+
+/*
+ * Return true if there is at least one prepared foreign transaction
+ * which matches given arguments.
+ */
+bool
+FdwXactExists(TransactionId xid, Oid umid)
+{
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	idx = get_fdwxact_idx(xid, umid);
+	LWLockRelease(FdwXactLock);
+
+	return (idx >= 0);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXacts that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXacts that need to be copied to disk.
+ *
+ * If a FdwXact remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts == 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXact that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXact with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->data.xid, fdwxact->data.umid, buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
 
-		/* Commit or rollback foreign transaction */
-		FdwXactParticipantEndTransaction(fdw_part, is_commit, is_parallel_worker);
+	LWLockRelease(FdwXactLock);
 
-		/* Successfully finished foreign transaction, remove the entry  */
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXact files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Insert FdwXact entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	FdwXactParticipant *fdw_part;
+	HASH_SEQ_STATUS scan;
+
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	hash_seq_init(&scan, FdwXactParticipants);
+	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
+	{
+		FdwXactInfo finfo;
+		FdwXact		fdwxact;
+		char		*identifier;
+
+		Assert(ServerSupportTwophaseCommit(fdw_part));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		identifier = getFdwXactIdentifier(fdw_part, xid);
+		Assert(identifier);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertEntry(xid, fdw_part, identifier);
+
+		/*
+		 * Prepare the foreign transaction.  Between FdwXactInsertEntry call till
+		 * this backend hears acknowledge from foreign server, the backend may
+		 * abort the local transaction (say, because of a signal).
+		 */
+		finfo.server = fdw_part->server;
+		finfo.usermapping = fdw_part->usermapping;
+		finfo.flags = 0;
+		finfo.identifier = identifier;
+		fdw_part->prepare_foreign_xact_fn(&finfo);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<umid> whose length is less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+getFdwXactIdentifier(FdwXactParticipant *fdw_part, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
+			 xid, fdw_part->key);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function is used to create new foreign transaction entry before an FDW
+ * prepares and commit/rollback. The function adds the entry to WAL and it will
+ * be persisted to the disk under pg_fdwxact directory when checkpoint.
+ */
+static FdwXact
+FdwXactInsertEntry(TransactionId xid, FdwXactParticipant *fdw_part,
+				   char *identifier)
+{
+	FdwXactOnDiskData *fdwxact_file_data;
+	FdwXact		fdwxact;
+	Oid			owner;
+	int			data_len;
+
+	/*
+	 * Enter the foreign transaction into the shared memory structure.
+	 */
+	owner = GetUserId();
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->key,
+							 fdw_part->usermapping->serverid, owner, identifier);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdw_part->fdwxact = fdwxact;
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactOnDiskData, identifier);
+	data_len = data_len + strlen(identifier) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len);
+	memcpy(fdwxact_file_data, &(fdwxact->data), data_len);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXact
+insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid, Oid owner,
+			   char *identifier)
+{
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+		if (fdwxact->valid &&
+			fdwxact->data.xid == xid &&
+			fdwxact->data.umid == umid)
+			ereport(ERROR,
+					(errmsg("could not insert a foreign transaction entry"),
+					 errdetail("Duplicate entry with transaction id %u, user mapping id %u exists.",
+							   xid, umid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts);
+	FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->data.xid = xid;
+	fdwxact->data.dbid = dbid;
+	fdwxact->data.umid = umid;
+	fdwxact->data.serverid = serverid;
+	fdwxact->data.owner = owner;
+	strlcpy(fdwxact->data.identifier, identifier, FDWXACT_ID_MAX_LEN);
+
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXact fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		if (FdwXactCtl->fdwxacts[i] == fdwxact)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		elog(ERROR, "failed to find %p in FdwXact array", fdwxact);
+
+	elog(DEBUG2, "remove fdwxact entry id %s", fdwxact->data.identifier);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_fdwxacts--;
+	FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.xid = fdwxact->data.xid;
+		record.umid = fdwxact->data.umid;
+
+		/*
+		 * Now writing FdwXact state data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllFdwXactParticipants();
+}
+
+/*
+ * Unlock foreign transaction participants and clear the FdwXactParticipants.
+ * If we left foreign transaction, update the oldest xmin of unresolved
+ * transaction to prevent the local transaction id of such unresolved foreign
+ * transaction from begin truncated.
+ */
+static void
+ForgetAllFdwXactParticipants(void)
+{
+	FdwXactParticipant *fdw_part;
+	HASH_SEQ_STATUS scan;
+	int	nremaining = 0;
+
+	if (!HasFdwXactParticipant())
+		return;
+
+	hash_seq_init(&scan, FdwXactParticipants);
+	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
+	{
+		FdwXact		fdwxact = fdw_part->fdwxact;
+
+		if (fdwxact)
+		{
+			/* Unlock the foreign transaction entry */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->locking_backend = InvalidBackendId;
+			LWLockRelease(FdwXactLock);
+
+			nremaining++;
+		}
+
+		/* Remove from the participants list */
 		RemoveFdwParticipant(fdw_part->key);
 	}
 
+	/*
+	 * If we leave any FdwXact entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nremaining > 0)
+	{
+		elog(DEBUG1, "%u foreign transactions remaining", nremaining);
+		FdwXactComputeRequiredXmin();
+	}
+
 	Assert(!HasFdwXactParticipant());
 }
 
@@ -204,6 +963,7 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool is_commit,
 	finfo.usermapping = fdw_part->usermapping;
 	finfo.flags = FDWXACT_FLAG_ONEPHASE |
 		((is_parallel_worker) ? FDWXACT_FLAG_PARALLEL_WORKER : 0);
+	finfo.identifier = NULL;
 
 	if (is_commit)
 	{
@@ -219,16 +979,962 @@ FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool is_commit,
 	}
 }
 
-/*
- * This function is called at PREPARE TRANSACTION.  Since we don't support
- * preparing foreign transactions yet, raise an error if the local transaction
- * has any foreign transaction.
- */
+
+/* Commit or rollback one prepared foreign transaction */
+static void
+ResolveOneFdwXact(FdwXact fdwxact)
+{
+	FdwXactInfo finfo;
+	FdwRoutine *routine;
+
+	/* The FdwXact entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+	Assert(fdwxact->status == FDWXACT_STATUS_PREPARED ||
+		   fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+		   fdwxact->status == FDWXACT_STATUS_ABORTING);
+
+	/* Set whether we do commit or abort if not set yet */
+	if (fdwxact->status == FDWXACT_STATUS_PREPARED)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->data.xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	routine = GetFdwRoutineByServerId(fdwxact->data.serverid);
+
+	/* Prepare the foreign transaction information to pass to API */
+	finfo.server = GetForeignServer(fdwxact->data.serverid);
+	finfo.usermapping = GetUserMapping(fdwxact->data.owner, fdwxact->data.serverid);
+	finfo.flags = 0;
+	finfo.identifier = fdwxact->data.identifier;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction %s",
+			 fdwxact->data.identifier);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction %s",
+			 fdwxact->data.identifier);
+	}
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->data.xid, fdwxact->data.umid, true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->data.xid));
+
+		/*
+		 * We can exclude entries that are marked as either committing or
+		 * aborting and its state file is on disk since such entries
+		 * no longer need to lookup its transaction status from the commit
+		 * log.
+		 */
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->data.xid, agg_xmin) ||
+			(fdwxact->ondisk &&
+			 (fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+			  fdwxact->status == FDWXACT_STATUS_ABORTING)))
+			agg_xmin = fdwxact->data.xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	elog(ERROR,
+		 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+/*
+ * Return the index of first found FdwXact entry that matched to given arguments.
+ * Otherwise return -1.	 The search condition is defined by arguments with valid
+ * values for respective datatypes.
+ */
+static int
+get_fdwxact_idx(TransactionId xid, Oid umid)
+{
+	int			i;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->data.xid)
+			continue;
+
+		/* umid */
+		if (OidIsValid(umid) && umid != fdwxact->data.umid)
+			continue;
+
+		/* This entry matches the condition */
+		return i;
+	}
+
+	return -1;
+}
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
 void
-AtPrepare_FdwXact(void)
+RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactFilePath(path, xid, umid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXact entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record), record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXact entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->xid, record->umid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->data.xid, fdwxact->data.umid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->data.xid, result))
+			result = fdwxact->data.xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXact depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXact files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		  *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId xid;
+			Oid		   umid;
+			char		  *buf;
+
+			sscanf(clde->d_name, "%08x_%08x", &xid, &umid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(xid, umid, InvalidXLogRecPtr,
+									   true);
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Scan the shared memory entries of FdwXact and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->data.xid, fdwxact->data.umid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %s from shared memory",
+						fdwxact->data.identifier)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf;
+	FdwXact		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->xid,
+							 fdwxact_data->umid, fdwxact_data->serverid,
+							 fdwxact_data->owner, fdwxact_data->identifier);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u user mapping %u owner %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->xid,
+		 fdwxact_data->umid, fdwxact_data->owner,
+		 fdwxact_data->identifier);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXact file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXact entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(TransactionId xid, Oid umid, bool givewarning)
+{
+	FdwXact		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		fdwxact = FdwXactCtl->fdwxacts[i];
+
+		if (fdwxact->data.xid == xid && fdwxact->data.umid == umid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_fdwxacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactFile(fdwxact->data.xid, fdwxact->data.umid, givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction %s",
+		 fdwxact->data.identifier);
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
 {
-	if (HasFdwXactParticipant())
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(TransactionId xid, Oid umid, XLogRecPtr insert_start_lsn,
+					 bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u and user mapping %u",
+							xid, umid)));
+			RemoveFdwXactFile(xid, umid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u and user mapping %u",
+							xid, umid)));
+			FdwXactRedoRemove(xid, umid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactFile(xid, umid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactFile(TransactionId xid, Oid umid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactFilePath(path, xid, umid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactOnDiskData, identifier) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactOnDiskData *) buf;
+	if (fdwxact_file_data->xid != xid ||
+		fdwxact_file_data->umid != umid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactFile(TransactionId xid, Oid umid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactFilePath(path, xid, umid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXact		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_fdwxacts; i++)
+	{
+		FdwXact		fdwxact = FdwXactCtl->fdwxacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->data.xid);
+		values[1] = ObjectIdGetDatum(fdwxact->data.umid);
+		values[2] = ObjectIdGetDatum(fdwxact->data.owner);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->data.identifier);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			umid = PG_GETARG_OID(1);
+	FdwXact		fdwxact;
+
+	fdwxact = get_fdwxact_with_check(xid, umid);
+	Assert(fdwxact);
+
+	if (TwoPhaseExists(fdwxact->data.xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->data.identifier),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	PG_TRY();
+	{
+		ResolveOneFdwXact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid		umid = PG_GETARG_OID(1);
+	FdwXact	fdwxact;
+
+	fdwxact = get_fdwxact_with_check(xid, umid);
+	Assert(fdwxact);
+
+	/* Hold the entry */
+	fdwxact->locking_backend = MyBackendId;
+
+	PG_TRY();
+	{
+		/* Clean up entry and any files we may have left */
+		if (fdwxact->ondisk)
+			RemoveFdwXactFile(fdwxact->data.xid, fdwxact->data.umid, true);
+		remove_fdwxact(fdwxact);
+	}
+	PG_CATCH();
+	{
+		if (fdwxact->valid)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+			fdwxact->locking_backend = InvalidBackendId;
+		}
+		LWLockRelease(FdwXactLock);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Return an FdwXact entry with given transaction id and user mapping OID.
+ */
+static FdwXact
+get_fdwxact_with_check(TransactionId xid, Oid umid)
+{
+	FdwXact		fdwxact;
+	Oid			myuserid;
+	int			idx;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	idx = get_fdwxact_idx(xid, umid);
+
+	if (idx < 0)
+	{
+		/* not found */
+		LWLockRelease(FdwXactLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("does not exist foreign transaction")));
+	}
+
+	fdwxact = FdwXactCtl->fdwxacts[idx];
+
+	/*
+	 * XXX: It probably would be possible to allow processing from another
+	 * database. But since there may be some issues, we disallow it for safety.
+	 */
+	if (fdwxact->data.dbid != MyDatabaseId)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction belongs to another database"),
+				 errhint("Connect to the database where the transaction was created to finish it.")));
+
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->data.owner && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	if (fdwxact->locking_backend != InvalidBackendId)
+	{
+		/* the entry is being processed by someone */
+		LWLockRelease(FdwXactLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->data.identifier)));
+	}
+
+	return fdwxact;
 }
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 70d22577ce..81facfb09c 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -845,6 +845,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index afe40429eb..cf4d5f2574 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2127,7 +2127,7 @@ CommitTransaction(void)
 					  : XACT_EVENT_PRE_COMMIT);
 
 	/* Call foreign transaction callbacks at pre-commit phase, if any */
-	AtEOXact_FdwXact(true, is_parallel_worker);
+	PreCommit_FdwXact(is_parallel_worker);
 
 	/* If we might have parallel workers, clean them up now. */
 	if (IsInParallelMode())
@@ -2286,6 +2286,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true, is_parallel_worker);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2559,6 +2560,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true, false);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e0c37f73f3..35a42eba1b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4626,6 +4627,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6360,6 +6362,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6914,14 +6919,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7123,7 +7129,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7634,11 +7643,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -7963,8 +7974,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9317,6 +9332,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9855,6 +9871,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9874,6 +9891,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9892,6 +9910,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -10099,6 +10118,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10302,6 +10322,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 8d8e926c21..fcc01fa8c4 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1549,6 +1549,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_USER_MAPPING:
+			RemoveUserMappingById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1564,7 +1568,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
 		case OCLASS_FOREIGN_SERVER:
-		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
 		case OCLASS_PUBLICATION:
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..588d229fd2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -333,6 +333,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index eb7103fd3b..a56f01f170 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1060,7 +1061,6 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
-
 /*
  * Common routine to check permission for user-mapping-related DDL
  * commands.  We allow server owners to operate on any mapping, and
@@ -1307,6 +1307,37 @@ AlterUserMapping(AlterUserMappingStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop the given user mapping
+ */
+void
+RemoveUserMappingById(Oid umid)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(UserMappingRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for user mapping %u", umid);
+
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(InvalidTransactionId, umid))
+		ereport(ERROR,
+				(errmsg("user mapping %u has unresolved prepared transaction",
+						umid)));
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Drop user mapping
@@ -1374,6 +1405,7 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index f8eb4fa215..6ce76b2aec 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..701ccb3a03 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4238,6 +4238,15 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_DSM_FILL_ZERO_WRITE:
 			event_name = "DSMFillZeroWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
 		case WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ:
 			event_name = "LockFileAddToDataDirRead";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d897f2c5fc 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -178,6 +178,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..6f14a950bf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -267,6 +269,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4085891237..34e70e57cd 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1709,6 +1715,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1836,6 +1843,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1893,6 +1906,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3804,6 +3820,21 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..4124321640 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 00018abb7d..4d97316fb1 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -30,6 +30,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -2470,6 +2471,16 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ee06528bb0..169d8e0d87 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -128,6 +128,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index b42a08a095..cfb9b343f2 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -206,6 +206,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 3e00ac0f70..53bc3d82d7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -302,6 +302,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..dd70a0f8a2 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 0699011af5..61566aef5a 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -11,13 +11,83 @@
 #define FDWXACT_H
 
 #include "access/xact.h"
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 #define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactData *FdwXact;
+typedef struct FdwXactData
+{
+	FdwXact		fdwxact_free_next;	/* Next free FdwXact entry */
+
+	/* Information relevant with foreign transaction */
+	FdwXactOnDiskData data;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXact. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXact. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		identifier[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+} FdwXactData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactData structs */
+	FdwXact		free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int			num_fdwxacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXact		fdwxacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactInfo
 {
@@ -25,10 +95,24 @@ typedef struct FdwXactInfo
 	UserMapping		*usermapping;
 
 	int	flags;			/* OR of FDWXACT_FLAG_xx flags */
+	char   *identifier;
 } FdwXactInfo;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+
 /* Function declarations */
+extern void PreCommit_FdwXact(bool is_parallel_worker);
 extern void AtEOXact_FdwXact(bool is_commit, bool is_parallel_worker);
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtPrepare_FdwXact(void);
+extern bool FdwXactExists(TransactionId xid, Oid umid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
+								int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..3eb068c0d1
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,50 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId xid;
+	Oid		dbid;
+	Oid		umid;
+	Oid		serverid;
+	Oid		owner;
+	char	identifier[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid		umid;
+	bool	force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index f582cf535f..5ab1f57212 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 91786da784..3d35f89ae0 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..0823baf1a1 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..5673ec7299 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710d59..eef837baa0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6076,6 +6076,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '100', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,umid,owner,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid',
+  proargnames => '{xid,umid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid',
+  proargnames => '{xid,umid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 1a79540c94..3992198a82 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
+extern void RemoveUserMappingById(Oid umid);
 extern void CreateForeignTable(CreateForeignTableStmt *stmt, Oid relid);
 extern void ImportForeignSchema(ImportForeignSchemaStmt *stmt);
 extern Datum transformGenericOptions(Oid catalogId,
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 55c040e0c3..e6e294e3ec 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -179,6 +179,7 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
 
@@ -264,6 +265,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..b50daaeb79 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1044,6 +1044,9 @@ typedef enum
 	WAIT_EVENT_DATA_FILE_TRUNCATE,
 	WAIT_EVENT_DATA_FILE_WRITE,
 	WAIT_EVENT_DSM_FILL_ZERO_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_READ,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_SYNC,
 	WAIT_EVENT_LOCK_FILE_ADDTODATADIR_WRITE,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index b01fa52139..300a4cf5b6 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -93,5 +93,6 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 10a1f34ebc..768ce51782 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.umid,
+    f.owner,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, umid, owner, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.27.0

v35-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v35-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From 48ab1acca2b92cd4266bda611e35120c3f70b701 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v35 01/10] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/Makefile  |   1 +
 src/backend/access/transam/fdwxact.c | 234 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |   8 +
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  34 ++++
 src/include/foreign/fdwapi.h         |  12 ++
 6 files changed, 293 insertions(+)
 create mode 100644 src/backend/access/transam/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 595e02de72..b05a88549d 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	clog.o \
 	commit_ts.o \
+	fdwxact.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
new file mode 100644
index 0000000000..7da90eae13
--- /dev/null
+++ b/src/backend/access/transam/fdwxact.c
@@ -0,0 +1,234 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by user mapping OID.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the foreign
+ * tranasctions.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/pg_user_mapping.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+/* Check the FdwXactParticipant is capable of two-phase commit  */
+#define ServerSupportTransactionCallback(fdw_part) \
+	(((FdwXactParticipant *)(fdw_part))->commit_foreign_xact_fn != NULL)
+
+/* Check the current transaction has at least one fdwxact participant */
+#define HasFdwXactParticipant() \
+	(FdwXactParticipants != NULL && \
+	 hash_get_num_entries(FdwXactParticipants) > 0)
+
+/*
+ * Structure to bundle the foreign transaction participant.	 This struct
+ * needs to live until the end of transaction, it's allocated in the
+ * TopTransactionContext.
+ *
+ * Participants are identified by user mapping OID, rather than pair of
+ * user OID and server OID. See README.fdwxact for the discussion.
+ */
+typedef Oid FdwXactPartKey;
+typedef struct FdwXactParticipant
+{
+	FdwXactPartKey key; /* user mapping OID, hash key (must be first) */
+
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactParticipant;
+
+/*
+ * Foreign transactions involved in the transaction.  A member of
+ * participants must support both commit and rollback APIs.
+ */
+static HTAB *FdwXactParticipants = NULL;
+
+/* Initial size of the hash table */
+#define FDWXACT_HASH_SIZE	64
+
+static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part,
+											 bool is_commit, bool is_parallel_worker);
+static void RemoveFdwParticipant(FdwXactPartKey umid);
+
+/*
+ * Register the given foreign transaction identified by the given user
+ * mapping OID as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(UserMapping *usermapping)
+{
+	FdwXactParticipant	*fdw_part;
+	FdwRoutine	*routine;
+	FdwXactPartKey	key;
+	MemoryContext old_ctx;
+	bool	found;
+
+	Assert(IsTransactionState());
+
+	if (FdwXactParticipants == NULL)
+	{
+		HASHCTL	ctl;
+
+		ctl.keysize = sizeof(FdwXactPartKey);
+		ctl.entrysize = sizeof(FdwXactParticipant);
+
+		/* Assume create the hash table in TopMemoryContext */
+		FdwXactParticipants = hash_create("fdw xact participants",
+										  FDWXACT_HASH_SIZE,
+										  &ctl, HASH_ELEM | HASH_BLOBS);
+	}
+
+	key = usermapping->umid;
+	fdw_part = hash_search(FdwXactParticipants, (void *) &key, HASH_ENTER, &found);
+
+	/* Already registered */
+	if (found)
+		return;
+
+	/*
+	 * The participant information needs to live until the end of the transaction
+	 * where syscache is not available, so we save them in TopMemoryContext.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopMemoryContext);
+
+	fdw_part->usermapping = GetUserMapping(usermapping->userid, usermapping->serverid);
+	fdw_part->server = GetForeignServer(usermapping->serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	routine = GetFdwRoutineByServerId(usermapping->serverid);
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	MemoryContextSwitchTo(old_ctx);
+	pfree(routine);
+}
+
+/* Remove the foreign transaction from FdwXactParticipants */
+void
+FdwXactUnregisterXact(UserMapping *usermapping)
+{
+	Assert(IsTransactionState());
+	RemoveFdwParticipant(usermapping->umid);
+}
+
+/*
+ * Remove an FdwXactParticipant entry identified by the given user mapping id
+ * from the hash table, and free the resouce if found.
+ */
+static void
+RemoveFdwParticipant(FdwXactPartKey key)
+{
+	bool found;
+	FdwXactParticipant *fdw_part;
+
+	fdw_part = hash_search(FdwXactParticipants, (void *) &key, HASH_REMOVE,
+							&found);
+
+	if (found)
+	{
+		pfree(fdw_part->server);
+		pfree(fdw_part->usermapping);
+	}
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool is_commit, bool is_parallel_worker)
+{
+	FdwXactParticipant *fdw_part;
+	HASH_SEQ_STATUS scan;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (!HasFdwXactParticipant())
+		return;
+
+	hash_seq_init(&scan, FdwXactParticipants);
+	while ((fdw_part = (FdwXactParticipant *) hash_seq_search(&scan)))
+	{
+		Assert(ServerSupportTransactionCallback(fdw_part));
+
+		/* Commit or rollback foreign transaction */
+		FdwXactParticipantEndTransaction(fdw_part, is_commit, is_parallel_worker);
+
+		/* Successfully finished foreign transaction, remove the entry  */
+		RemoveFdwParticipant(fdw_part->key);
+	}
+
+	Assert(!HasFdwXactParticipant());
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool is_commit,
+								 bool is_parallel_worker)
+{
+	FdwXactInfo finfo;
+
+	Assert(ServerSupportTransactionCallback(fdw_part));
+
+	finfo.server = fdw_part->server;
+	finfo.usermapping = fdw_part->usermapping;
+	finfo.flags = FDWXACT_FLAG_ONEPHASE |
+		((is_parallel_worker) ? FDWXACT_FLAG_PARALLEL_WORKER : 0);
+
+	if (is_commit)
+	{
+		fdw_part->commit_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully committed the foreign transaction for user mapping %u",
+			 fdw_part->key);
+	}
+	else
+	{
+		fdw_part->rollback_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for user mapping %u",
+			 fdw_part->key);
+	}
+}
+
+/*
+ * This function is called at PREPARE TRANSACTION.  Since we don't support
+ * preparing foreign transactions yet, raise an error if the local transaction
+ * has any foreign transaction.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	if (HasFdwXactParticipant())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 17fbc41bbb..afe40429eb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2125,6 +2126,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
+	/* Call foreign transaction callbacks at pre-commit phase, if any */
+	AtEOXact_FdwXact(true, is_parallel_worker);
+
 	/* If we might have parallel workers, clean them up now. */
 	if (IsInParallelMode())
 		AtEOXact_Parallel(true);
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Process foreign trasactions */
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2705,6 +2712,7 @@ AbortTransaction(void)
 	AtAbort_Notify();
 	AtEOXact_RelationMap(false, is_parallel_worker);
 	AtAbort_Twophase();
+	AtEOXact_FdwXact(false, is_parallel_worker);
 
 	/*
 	 * Advertise the fact that we aborted in pg_xact (assuming that we got as
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 5564dc3a1e..f8eb4fa215 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support both or nothing */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..0699011af5
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,34 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/xact.h"
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+#define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactInfo
+{
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	int	flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactInfo;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool is_commit, bool is_parallel_worker);
+extern void AtPrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78da45..55c040e0c3 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -178,6 +179,9 @@ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
+typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -256,6 +260,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -269,4 +277,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(UserMapping *usermapping);
+extern void FdwXactUnregisterXact(UserMapping *usermapping);
+
 #endif							/* FDWAPI_H */
-- 
2.27.0

#223Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#222)
Re: Transactions involving multiple postgres foreign servers, take 2

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

With this commit, the foreign server modified within the transaction marked
as 'modified'.

transaction marked -> transaction is marked

+#define IsForeignTwophaseCommitRequested() \
+    (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro
should be named: IsForeignTwophaseCommitRequired.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+       if (!ServerSupportTwophaseCommit(fdw_part))
+           have_no_twophase = true;
...
+   if (have_no_twophase)
+       ereport(ERROR,

It seems the error case should be reported within the loop. This way, we
don't need to iterate the other participant(s).
Accordingly, nserverswritten should be incremented for local server prior
to the loop. The condition in the loop would become if
(!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
have_no_twophase is no longer needed.

Cheers

On Tue, Mar 16, 2021 at 8:04 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Show quoted text

On Mon, Mar 15, 2021 at 3:55 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

On Thu, Feb 11, 2021 at 6:25 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <

masao.fujii@oss.nttdata.com> wrote:

On 2021/01/27 14:08, Masahiko Sawada wrote:

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

You fixed some issues. But maybe you forgot to attach the latest

patches?

Yes, I've attached the updated patches.

Thanks for updating the patch! I tried to review 0001 and 0002 as

the self-contained change.

+ * An FDW that implements both commit and rollback APIs can

request to register

+ * the foreign transaction by FdwXactRegisterXact() to participate

it to a

+ * group of distributed tranasction. The registered foreign

transactions are

+ * identified by OIDs of server and user.

I'm afraid that the combination of OIDs of server and user is not

unique. IOW, more than one foreign transactions can have the same
combination of OIDs of server and user. For example, the following two
SELECT queries start the different foreign transactions but their user OID
is the same. OID of user mapping should be used instead of OID of user?

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user

'postgres');

CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user

'postgres');

CREATE TABLE t(i int);
CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS

(table_name 't');

BEGIN;
SELECT * FROM ft;
DROP USER MAPPING FOR postgres SERVER loopback ;
SELECT * FROM ft;
COMMIT;

Good catch. I've considered using user mapping OID or a pair of user
mapping OID and server OID as a key of foreign transactions but I
think it also has a problem if an FDW caches the connection by pair of
server OID and user OID whereas the core identifies them by user
mapping OID. For instance, mysql_fdw manages connections by pair of
server OID and user OID.

For example, let's consider the following execution:

BEGIN;
SET ROLE user_A;
INSERT INTO ft1 VALUES (1);
SET ROLE user_B;
INSERT INTO ft1 VALUES (1);
COMMIT;

Suppose that an FDW identifies the connections by {server OID, user
OID} and the core GTM identifies the transactions by user mapping OID,
and user_A and user_B use the public user mapping to connect server_X.
In the FDW, there are two connections identified by {user_A, sever_X}
and {user_B, server_X} respectively, and therefore opens two
transactions on each connection, while GTM has only one FdwXact entry
because the two connections refer to the same user mapping OID. As a
result, at the end of the transaction, GTM ends only one foreign
transaction, leaving another one.

Using user mapping OID seems natural to me but I'm concerned that
changing role in the middle of transaction is likely to happen than
dropping the public user mapping but not sure. We would need to find
more better way.

After more thought, I'm inclined to think it's better to identify
foreign transactions by user mapping OID. The main reason is, I think
FDWs that manages connection caches by pair of user OID and server OID
potentially has a problem with the scenario Fujii-san mentioned. If an
FDW has to use another user mapping (i.g., connection information) due
to the currently used user mapping being removed, it would have to
disconnect the previous connection because it has to use the same
connection cache. But at that time it doesn't know the transaction
will be committed or aborted.

Also, such FDW has the same problem that postgres_fdw used to have; a
backend establishes multiple connections with the same connection
information if multiple local users use the public user mapping. Even
from the perspective of foreign transaction management, it more makes
sense that foreign transactions correspond to the connections to
foreign servers, not to the local connection information.

I can see that some FDW implementations such as mysql_fdw and
firebird_fdw identify connections by pair of server OID and user OID
but I think this is because they consulted to old postgres_fdw code. I
suspect that there is no use case where FDW needs to identify
connections in that way. If the core GTM identifies them by user
mapping OID, we could enforce those FDWs to change their way but I
think that change would be the right improvement.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Regression is failing, can you please take a look.

Thank you!

I've attached the updated version patch set.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#224Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#222)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/03/17 12:03, Masahiko Sawada wrote:

I've attached the updated version patch set.

Thanks for updating the patches! I'm now restarting to review of 2PC because
I'd like to use this feature in PG15.

I think the following logic of resolving and removing the fdwxact entries
by the transaction resolver needs to be fixed.

1. check if pending fdwxact entries exist

HoldInDoubtFdwXacts() checks if there are entries which the condition is
InvalidBackendId and so on. After that it gets the indexes of the fdwxacts
array. The fdwXactLock is released at the end of this phase.

2. resolve and remove the entries held in 1th phase.

ResolveFdwXacts() resloves the status per each fdwxact entry using the
indexes. The end of resolving, the transaction resolver remove the entry in
fdwxacts array via remove_fdwact().

The way to remove the entry is the following. Since to control using the
index, the indexes of getting in the 1st phase are meaningless anymore.

/* Remove the entry from active array */
FdwXactCtl->num_fdwxacts--;
FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];

This seems to lead resolving the unexpected fdwxacts and it can occur the
following assertion error. That's why I noticed. For example, there is the
case which a backend inserts new fdwxact entry in the free space, which the
resolver removed the entry right before, and the resolver accesses the new
entry which doesn't need to resolve yet because it use the indexes checked in
1st phase.

Assert(fdwxact->locking_backend == MyBackendId);

The simple solution is that to get fdwXactLock exclusive all the time from the
begining of 1st phase to the finishing of 2nd phase. But, I worried that the
performance impact became too big...

I came up with two solutions although there may be better solutions.

A. to remove resolved entries at once after resolution for all held entries is
finished

If so, we don't need to take the exclusive lock for a long time. But, this
have other problems, which pg_remove_foreign_xact() can still remove entries
and we need to handle the fail of resolving.

I wondered that we can solve the first problem to introduce a new lock like
"removing lock" and only the processes which hold the lock can remove the
entries. The performance impact is limited since the insertion the fdwxact
entries is not blocked by this lock. And second problem can be solved using
try-catch sentence.

B. to merge 1st and 2nd phase

Now, the resolver resolves the entries together. That's the reason why it's
difficult to remove the entries. So, it seems to solve the problem to execute
checking, resolving and removing per each entry. I think it's better since
this is simpler than A. If I'm missing something, please let me know.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#225Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiro Ikeda (#224)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Apr 27, 2021 at 10:03 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

On 2021/03/17 12:03, Masahiko Sawada wrote:

I've attached the updated version patch set.

Thanks for updating the patches! I'm now restarting to review of 2PC because
I'd like to use this feature in PG15.

Thank you for reviewing the patch! Much appreciated.

I think the following logic of resolving and removing the fdwxact entries
by the transaction resolver needs to be fixed.

1. check if pending fdwxact entries exist

HoldInDoubtFdwXacts() checks if there are entries which the condition is
InvalidBackendId and so on. After that it gets the indexes of the fdwxacts
array. The fdwXactLock is released at the end of this phase.

2. resolve and remove the entries held in 1th phase.

ResolveFdwXacts() resloves the status per each fdwxact entry using the
indexes. The end of resolving, the transaction resolver remove the entry in
fdwxacts array via remove_fdwact().

The way to remove the entry is the following. Since to control using the
index, the indexes of getting in the 1st phase are meaningless anymore.

/* Remove the entry from active array */
FdwXactCtl->num_fdwxacts--;
FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];

This seems to lead resolving the unexpected fdwxacts and it can occur the
following assertion error. That's why I noticed. For example, there is the
case which a backend inserts new fdwxact entry in the free space, which the
resolver removed the entry right before, and the resolver accesses the new
entry which doesn't need to resolve yet because it use the indexes checked in
1st phase.

Assert(fdwxact->lockeing_backend == MyBackendId);

Good point. I agree with your analysis.

The simple solution is that to get fdwXactLock exclusive all the time from the
begining of 1st phase to the finishing of 2nd phase. But, I worried that the
performance impact became too big...

I came up with two solutions although there may be better solutions.

A. to remove resolved entries at once after resolution for all held entries is
finished

If so, we don't need to take the exclusive lock for a long time. But, this
have other problems, which pg_remove_foreign_xact() can still remove entries
and we need to handle the fail of resolving.

I wondered that we can solve the first problem to introduce a new lock like
"removing lock" and only the processes which hold the lock can remove the
entries. The performance impact is limited since the insertion the fdwxact
entries is not blocked by this lock. And second problem can be solved using
try-catch sentence.

B. to merge 1st and 2nd phase

Now, the resolver resolves the entries together. That's the reason why it's
difficult to remove the entries. So, it seems to solve the problem to execute
checking, resolving and removing per each entry. I think it's better since
this is simpler than A. If I'm missing something, please let me know.

It seems to me that solution B would be simpler and better. I'll try
to fix this issue by using solution B and rebase the patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#226Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhihong Yu (#223)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

With this commit, the foreign server modified within the transaction marked as 'modified'.

transaction marked -> transaction is marked

Will fix.

+#define IsForeignTwophaseCommitRequested() \
+    (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired.

But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+       if (!ServerSupportTwophaseCommit(fdw_part))
+           have_no_twophase = true;
...
+   if (have_no_twophase)
+       ereport(ERROR,

It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s).
Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop would become if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
have_no_twophase is no longer needed.

Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#227Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#226)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

With this commit, the foreign server modified within the transaction

marked as 'modified'.

transaction marked -> transaction is marked

Will fix.

+#define IsForeignTwophaseCommitRequested() \
+    (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the

macro should be named: IsForeignTwophaseCommitRequired.

But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+       if (!ServerSupportTwophaseCommit(fdw_part))
+           have_no_twophase = true;
...
+   if (have_no_twophase)
+       ereport(ERROR,

It seems the error case should be reported within the loop. This way, we

don't need to iterate the other participant(s).

Accordingly, nserverswritten should be incremented for local server

prior to the loop. The condition in the loop would become if
(!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).

have_no_twophase is no longer needed.

Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.

Then the check would stay as what you have in the patch:

if (!ServerSupportTwophaseCommit(fdw_part))

When the non-2pc-capable server is encountered, we would report the error
in place (following the ServerSupportTwophaseCommit check) and come out of
the loop.
have_no_twophase can be dropped.

Thanks

Show quoted text

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#228Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhihong Yu (#227)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

With this commit, the foreign server modified within the transaction marked as 'modified'.

transaction marked -> transaction is marked

Will fix.

+#define IsForeignTwophaseCommitRequested() \
+    (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired.

But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+       if (!ServerSupportTwophaseCommit(fdw_part))
+           have_no_twophase = true;
...
+   if (have_no_twophase)
+       ereport(ERROR,

It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s).
Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop would become if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
have_no_twophase is no longer needed.

Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.

Then the check would stay as what you have in the patch:

if (!ServerSupportTwophaseCommit(fdw_part))

When the non-2pc-capable server is encountered, we would report the error in place (following the ServerSupportTwophaseCommit check) and come out of the loop.
have_no_twophase can be dropped.

But if we processed only one non-2pc-capable server, we would raise an
error but should not in that case.

On second thought, I think we can track how many servers are modified
or not capable of 2PC during registration and unr-egistration. Then we
can consider both 2PC is required and there is non-2pc-capable server
is involved without looking through all participants. Thoughts?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#229Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#228)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, May 3, 2021 at 5:25 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

With this commit, the foreign server modified within the transaction

marked as 'modified'.

transaction marked -> transaction is marked

Will fix.

+#define IsForeignTwophaseCommitRequested() \
+    (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the

macro should be named: IsForeignTwophaseCommitRequired.

But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+       if (!ServerSupportTwophaseCommit(fdw_part))
+           have_no_twophase = true;
...
+   if (have_no_twophase)
+       ereport(ERROR,

It seems the error case should be reported within the loop. This way,

we don't need to iterate the other participant(s).

Accordingly, nserverswritten should be incremented for local server

prior to the loop. The condition in the loop would become if
(!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).

have_no_twophase is no longer needed.

Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.

Then the check would stay as what you have in the patch:

if (!ServerSupportTwophaseCommit(fdw_part))

When the non-2pc-capable server is encountered, we would report the

error in place (following the ServerSupportTwophaseCommit check) and come
out of the loop.

have_no_twophase can be dropped.

But if we processed only one non-2pc-capable server, we would raise an
error but should not in that case.

On second thought, I think we can track how many servers are modified
or not capable of 2PC during registration and unr-egistration. Then we
can consider both 2PC is required and there is non-2pc-capable server
is involved without looking through all participants. Thoughts?

That is something worth trying.

Thanks

Show quoted text

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#230Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhihong Yu (#229)
9 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, May 3, 2021 at 11:11 PM Zhihong Yu <zyu@yugabyte.com> wrote:

On Mon, May 3, 2021 at 5:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

With this commit, the foreign server modified within the transaction marked as 'modified'.

transaction marked -> transaction is marked

Will fix.

+#define IsForeignTwophaseCommitRequested() \
+    (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired.

But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+       if (!ServerSupportTwophaseCommit(fdw_part))
+           have_no_twophase = true;
...
+   if (have_no_twophase)
+       ereport(ERROR,

It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s).
Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop would become if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
have_no_twophase is no longer needed.

Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.

Then the check would stay as what you have in the patch:

if (!ServerSupportTwophaseCommit(fdw_part))

When the non-2pc-capable server is encountered, we would report the error in place (following the ServerSupportTwophaseCommit check) and come out of the loop.
have_no_twophase can be dropped.

But if we processed only one non-2pc-capable server, we would raise an
error but should not in that case.

On second thought, I think we can track how many servers are modified
or not capable of 2PC during registration and unr-egistration. Then we
can consider both 2PC is required and there is non-2pc-capable server
is involved without looking through all participants. Thoughts?

That is something worth trying.

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v36-0009-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v36-0009-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From 0135c4521058308aad79c1e94ac5011500212268 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v36 9/9] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 ++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 526 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/025_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1296 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/025_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..6fde3e8a84 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..fc5f0af522
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..20e4a671df
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 1
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..40b774e5d0
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution(expected int) AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = expected INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution(0);
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..c32bea5df2
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use File::Copy qw/copy move/;
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $expected) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$expected = 0 unless defined $expected;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) = $expected FROM pg_foreign_xacts");
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback .* on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..89d67c720f
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,526 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactInfo *finfo);
+static void testCommitForeignTransaction(FdwXactInfo *finfo);
+static void testRollbackForeignTransaction(FdwXactInfo *finfo);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	UserMapping	*usermapping;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	usermapping = GetUserMapping(userid, table->serverid);
+	FdwXactRegisterXact(usermapping, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+
+	if (check_event(finfo->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 finfo->identifier,
+							 finfo->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(finfo->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 finfo->identifier,
+								 finfo->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 finfo->identifier,
+								 finfo->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 96442ceb4e..0e5e05e41a 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/025_fdwxact.pl b/src/test/recovery/t/025_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/025_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index b7d80bd9bb..c0ded6909b 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2348,9 +2348,12 @@ regression_main(int argc, char *argv[],
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2365,7 +2368,9 @@ regression_main(int argc, char *argv[],
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 5a1ab33b3d..20865a74fa 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -54,7 +54,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.24.3 (Apple Git-128)

v36-0007-Add-GetPrepareId-API.patchapplication/octet-stream; name=v36-0007-Add-GetPrepareId-API.patchDownload
From 5f0d996720bd20d5614c168cfd2e124717986965 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v36 7/9] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c | 52 ++++++++++++++++++++++++----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index bc4540d3ae..df9ff87ae3 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -181,6 +181,7 @@ typedef struct FdwXactEntry
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactEntry;
 
 /*
@@ -385,6 +386,7 @@ FdwXactRegisterXact(UserMapping *usermapping, bool modified)
 	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdwent->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdwent->get_prepareid_fn = routine->GetPrepareId;
 
 	MemoryContextSwitchTo(old_ctx);
 
@@ -871,9 +873,10 @@ FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<umid> whose length is less than FDWXACT_ID_MAX_LEN.
+ * Return a null-terminated foreign transaction identifier.  If the given FDW
+ * supports getPrepareId callback we return the identifier returned from it.
+ * Otherwise we generate an unique identifier with in the form of
+ * "fx_<random number>_<xid>_<umid>" whose length is less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
  * identifier should not be same as any other concurrent prepared transaction
@@ -887,12 +890,47 @@ FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 static char *
 getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
-			 xid, fdwent->umid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdwent->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
+				 xid, fdwent->umid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdwent->get_prepareid_fn(xid, fdwent->server->serverid,
+								  fdwent->usermapping->userid,
+								  &id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 55298f8dae..0f6d0543c7 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -195,6 +195,8 @@ typedef void (*ForeignAsyncNotify_function) (AsyncRequest *areq);
 typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 
 /*
@@ -289,6 +291,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.24.3 (Apple Git-128)

v36-0008-Documentation-update.patchapplication/octet-stream; name=v36-0008-Documentation-update.patchDownload
From 2898a174bba1895428e010001f6e576ff73e2e39 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v36 8/9] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 +++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 245 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 src/backend/access/transam/README.fdwxact | 134 ++++++++++++
 10 files changed, 1013 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/transam/README.fdwxact

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6d06ad22b9..6db7a7ba8c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9394,6 +9394,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11268,6 +11273,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45bd1f1b7e..431f1fb796 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9552,6 +9552,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..1106fe00c9
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign servers were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in terms of a federated database.
+   Atomic commit of distributed transactions is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers are either committed or rolled back using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transactions on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the preparation on all foreign servers
+       is successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits the transaction locally.
+       Once the local transaction gets committed, we will never rollback any
+       involved transactions.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transactions on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns an acknowledgment of the successful commit of the
+    distributed transaction to the client after step 2.  After that, all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  In case where the local node crashes during
+    preparing transactions, the distributed transaction becomes in-doubt
+    state.  The information of involved foreign transactions is recovered
+    during crash recovery and these are resolved in background.  Until all
+    in-doubt state transactions are resolved, other transactions might see
+    an inconsistent results on the foreign servers on reading.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolve the
+    transactions associated with the in-doubt distributed transaction. Or you
+    can use <function>pg_resolve_foriegn_xact</function> function to resolve
+    it manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     the database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally,
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 8aa7edfe4a..bbf7c0b488 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1657,6 +1657,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>finfo-&gt;flags</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>finfo-&gt;identifier</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when a user requested rollbacking or when
+    an error occurs during the transaction. This function must be tolerant to
+    being called recursively if any error occurs during rollback of the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>finfo-&gt;flags</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>finfo-&gt;identifier</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>finfo-&gt;identifier</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is a failure during preparing the foreign transaction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that in all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null-terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be a string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates a unique identifier in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;xid&gt;_&lt;user mapping oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -2136,4 +2247,138 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a server used by an  FDW supports transactions, it is usually worthwhile
+    for the FDW to manage transactions opened on the foreign server. The FDW
+    callback function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactInfo</literal> can be used to get
+    information of foreign server being processed such as
+    <structname>ForeignServer</structname> and <structname>UserMapping</structname>
+    The <literal>flags</literal> has contains flag bit describing the
+    foreign transaction state for transaction management.
+   </para>
+
+   <para>
+    The foreign transaction needs to be registered to
+    <productname>PostgreSQL</productname> global transaction manager.
+    Registration and unregistration are done by calling
+    <function>FdwXactRegisterXact</function> and
+    <function>FdwXactUnregisterXact</function> respectively.
+    The FDW can pass a boolean <literal>modified</literal> along with
+    <structname>UserMapping</structname> to <function>FdwXactRegisterXact</function>
+    indicating that writes are going to happen on the foreign server.  Such
+    foreign servers are taken into account for the decision of two-phase
+    commit protocol being required or not.
+   </para>
+
+   <para>
+    The FDW callback function <function>CommitForeignTransaction</function>
+    and <function>RollbackForeignTransaction</function> are used to commit
+    and rollback foreign transactions. During transaction commit, the global
+    transaction manager calls <function>CommitForeignTransaction</function> function
+    in the pre-commit phase and calls
+    <function>RollbackForeignTransaction</function> function in the post-rollback
+    phase.
+   </para>
+
+   <para>
+    In addition to simply commit and rollback foreign transactions,
+    <productname>PostgreSQL</productname> global transaction manager enables
+    distributed transactions to atomically commit and rollback among all foreign
+    servers, which is as known as atomic commit in literature. To achieve atomic
+    commit, <productname>PostgreSQL</productname> employs two-phase commit
+    protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+    to support two-phase commit protocol are required to have the FDW callback
+    function <function>PrepareForeignTransaction</function> and optionally
+    <function>GetPrepareId</function>, in addition to
+    <function>CommitForeignTransaction</function> and
+    <function>RollbackForeignTransaction</function>
+    (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+   </para>
+
+   <para>
+    An example of a distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+   </para>
+
+   <para>
+    When the core executor access the foreign servers, foreign servers whose FDW
+    supports transaction management callback routines is registered as a participant.
+    During registration, <function>GetPrepareId</function> is called if provided to
+    generate a unique transaction identifier.
+   </para>
+
+   <para>
+    During pre-commit phase of the local transaction, the foreign transaction manager
+    persists the foreign transaction information to the disk and WAL, and then
+    prepare all foreign transactions by calling
+    <function>PrepareForeignTransaction</function> if two-phase commit protocol
+    is required. Two-phase commit is required when the transaction modified data
+    on more than one server including the local server itself and a user requests
+    foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+   </para>
+
+   <para>
+    <productname>PostgreSQL</productname> commits locally and go to the next
+    step if and only if all foreign transactions are prepared successfully.
+    If any failure happens or a user requests to cancel during preparation,
+    the global transaction manager changes over rollback and calls
+    <function>RollbackForeignTransaction</function>.
+   </para>
+
+   <para>
+    When changing over rollback due to any failure, it calls
+    <function>RollbackForeignTransaction</function> with
+    <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+    closed yet and calls <function>RollbackForeignTransaction</function> without
+    that flag for foreign transactions which are already prepared.  For foreign
+    transactions which are being prepared, it does both because it's not sure that
+    the preparation has been completed on the foreign server. Therefore,
+    <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+    object error.
+   </para>
+
+   <para>
+    Note that when <literal>(finfo-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+    is true, both <literal>CommitForeignTransaction</literal> function and
+    <literal>RollbackForeignTransaction</literal> function should commit and
+    rollback directly, rather than processing prepared transactions. This can
+    happen when two-phase commit is not required or a foreign server is not
+    modified within the transaction.
+   </para>
+
+   <para>
+    Once all foreign transactions are prepared, the core transaction manager commits
+    locally. After that the transaction commit waits for all prepared foreign
+    transaction to be committed before completion. After all prepared foreign
+    transactions are resolved the transaction commit completes.
+   </para>
+
+   <para>
+    One foreign transaction resolver process is responsible for foreign
+    transaction resolution on a database. The foreign transaction resolver process
+    calls either <function>CommitForeignTransaction</function> or
+    <function>RollbackForeignTransaction</function> to resolve the foreign
+    transaction identified by <literal>finfo-&gt;identifier</literal>. If failed
+    to resolve, the resolver process will exit with an error message. The foreign
+    transaction launcher will launch the resolver process again at
+    <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+   </para>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 45b701426b..3751d734c6 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4d1f1794ca..49a8b13f57 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27285,6 +27285,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcbb10fb6f..94006d0b2a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1094,6 +1094,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1318,6 +1330,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1624,6 +1648,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1942,6 +1971,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index d453be3909..eca35c4a84 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index bfccda77af..22cd494366 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
diff --git a/src/backend/access/transam/README.fdwxact b/src/backend/access/transam/README.fdwxact
new file mode 100644
index 0000000000..8da9030689
--- /dev/null
+++ b/src/backend/access/transam/README.fdwxact
@@ -0,0 +1,134 @@
+src/backend/access/transam/README.fdwxact
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Server Registration
+During executor node initialization, accessed foreign servers are registered
+to the list FdwXactParticipant, which is maintained by PostgreSQL's the global
+transaction manager (GTM), as a distributed transaction participant The
+registered foreign transactions are tracked until the end of transaction.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.
+
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node.
+
+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change to
+rollback, therefore at this time some participants might be prepared whereas
+some are not prepared. The former foreign transactions are resolved by
+the resolver process asynchronusly or can be resolved using by
+pg_resolve_foreign_xact() manually, and the latter ends transaction
+in one-phase by calling RollbackForeignTransaction() API.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this resolution step(commit or rollback) is done by the foreign transaction
+resolver process.
+
+
+Identifying Foreign Transactions In GTM
+---------------------------------------
+
+To identify foreign transaction participants (as well as FdwXact entries) there
+are two ways: using {server OID, user OID} and using user mapping OID. The same
+is true for FDWs to identify the connections (and transactions upon) to the
+foreign server. We need to consider the case where the way to identify the
+transactions is not matched between GTM and FDWs, because the problem might occur
+when the user modifies the same foreign server by different roles within the
+transaction. For example, consider the following execution:
+
+BEGIN;
+SET ROLE user_A;
+INSERT INTO ft1 VALUES (1);
+SET ROLE user_B;
+INSERT INTO ft1 VALUES (1);
+COMMIT;
+
+For example, suppose that an FDW identifies the connection by {server OID, user OID}
+and GTM identifies the transactions by user mapping OID, and user_A and user_B use
+the public user mapping to connect server_X. In the FDW, there are two
+connections: {user_A, sever_X} and {user_B, server_X}, and therefore opens two
+transactions on each connection, while GTM has only one FdwXact entry because the two
+connections refer to the same user mapping OID. As a result, at the end of the
+transaction, GTM ends only one foreign transaction, leaving another one.
+
+On the other hand, suppose that an FDW identifies the connection by user mapping OID
+and GTM does that by {server OID, user OID}, the FDW uses only one connection and opens
+a transaction since both users refer to the same user mapping OID (we expect FDWs
+not to register the foreign transaction when not starting a new transaction on the
+foreign server). Since GTM also has one entry it can end the foreign transaciton
+properly. The downside would be that the user OID of FdwXact (i.g., FdwXact->userid)
+is the user who registered the foreign transaction for the first time, necessarily
+not the user who executed COMMIT.  For example in the above case, FdwXact->userid
+will be user_A, not user_B. But it’s not big problem in practice.
+
+Therefore, in fdwxact.c, we identify the foreign transaction by
+{server OID, user OID}.
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transaction has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared. And the status changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before committing and
+aborting respectively. FdwXact entry is removed with WAL logging after resolved.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status for those entries is FDWXACT_STATUS_PREPARED if they are recovered
+from WAL. Because we WAL logs only when preparing the foreign transaction we
+cannot know the exact fate of the foreign transaction from the recovery.
+
+The foreign transaction status transition is illustrated by the following
+graph describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+   (*1)         |      PREPARING      |           (*1)
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |     COMMITTING     |          |      ABORTING      |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                        END                         |
+ +----------------------------------------------------+
+
+(*1) Paths for recovered FdwXact entries
-- 
2.24.3 (Apple Git-128)

v36-0006-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/octet-stream; name=v36-0006-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From 7d56d8b5378eae80160778314e6d2f6a18d17bf9 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 1 May 2021 09:00:01 +0900
Subject: [PATCH v36 6/9] postgres_fdw marks foreign transaction as modified on
 modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 2b0ff22370..7820d31d1e 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -63,6 +63,7 @@ typedef struct ConnCacheEntry
 	bool		keep_connections;	/* setting value of keep_connections
 									 * server option */
 	Oid			serverid;		/* foreign server OID used to get server name */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 	PgFdwConnState state;		/* extra per-connection state */
@@ -311,6 +312,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
 	entry->serverid = server->serverid;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -346,6 +348,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterXact(user, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -617,7 +633,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterXact(user);
+		FdwXactRegisterXact(user, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -626,6 +642,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 24aed7ae1d..1eac6c21f8 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1495,6 +1495,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	fsstate->conn = GetConnection(user, false, &fsstate->conn_state);
+	MarkConnectionModified(user);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -3903,6 +3904,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true, &fmstate->conn_state);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 97e4f244db..4fedbb76c4 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -144,6 +144,7 @@ extern void process_pending_request(AsyncRequest *areq);
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
 							 PgFdwConnState **state);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern void do_sql_command(PGconn *conn, const char *sql);
-- 
2.24.3 (Apple Git-128)

v36-0005-Prepare-foreign-transactions-at-commit-time.patchapplication/octet-stream; name=v36-0005-Prepare-foreign-transactions-at-commit-time.patchDownload
From 0637a264b1a915786f23a2eccf68c4594309f882 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v36 5/9] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c          | 168 +++++++++++++++++-
 src/backend/access/transam/xact.c             |   4 +
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |   9 +
 src/include/foreign/fdwapi.h                  |   2 +-
 6 files changed, 207 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 8100d4dd1d..bc4540d3ae 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -20,6 +20,23 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API for each foreign transaction regardless of data on
  * the foreign server having been modified.  At COMMIT PREPARED and ROLLBACK PREPARED,
@@ -96,8 +113,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -118,6 +137,10 @@
 #define ServerSupportTwophaseCommit(fdwent) \
 	(((FdwXactEntry *)(fdwent))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /*
  * Name of foreign prepared transaction file is 8 bytes xid and
  * user mapping OID separated by '_'.
@@ -151,6 +174,9 @@ typedef struct FdwXactEntry
 	 */
 	FdwXactState		fdwxact;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -158,6 +184,7 @@ typedef struct FdwXactEntry
 } FdwXactEntry;
 
 /*
+
  * The current distributed transaction state. Members of participants must
  * support at least both commit and rollback APIs
  * (ServerSupportTransactionCallback() is true)..
@@ -166,14 +193,20 @@ typedef struct DistributedXactStateData
 {
 	bool	all_prepared; /* all participants are prepared? */
 
+	bool	twophase_commit_required;
+
 	/* Statistics of participants */
 	int		nparticipants_no_twophase; /* how many participants doesn't support
 										* two-phase commit protocol? */
+	int		nparticipants_modified;		/* how many participants are modified? */
+
 	HTAB	*participants;
 } DistributedXactStateData;
 static DistributedXactStateData DistributedXactState = {
 	.all_prepared = false,
+	.twophase_commit_required = false,
 	.nparticipants_no_twophase = 0,
+	.nparticipants_modified = 0,
 	.participants = NULL,
 };
 
@@ -185,9 +218,11 @@ static DistributedXactStateData DistributedXactState = {
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void RemoveFdwXactEntry(Oid umid);
 static void EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit,
@@ -195,7 +230,7 @@ static void EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit,
 static char *getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid);
 static int ForgetAllParticipants(void);
 
-static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static void FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all);
 static FdwXactState FdwXactInsertEntry(TransactionId xid, FdwXactEntry *fdwent,
 									   char *identifier);
 static void AtProcExit_FdwXact(int code, Datum arg);
@@ -208,6 +243,7 @@ static char *ProcessFdwXactBuffer(TransactionId xid, Oid umid,
 								  XLogRecPtr insert_start_lsn, bool fromdisk);
 static char *ReadFdwXactStateFile(TransactionId xid, Oid umid);
 static void RemoveFdwXactStateFile(TransactionId xid, Oid umid, bool giveWarning);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 
 static FdwXactState insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid,
 							  Oid owner, char *identifier);
@@ -284,7 +320,7 @@ FdwXactShmemInit(void)
  * mapping OID as a participant of the transaction.
  */
 void
-FdwXactRegisterXact(UserMapping *usermapping)
+FdwXactRegisterXact(UserMapping *usermapping, bool modified)
 {
 	FdwXactEntry	*fdwent;
 	FdwRoutine	*routine;
@@ -310,8 +346,21 @@ FdwXactRegisterXact(UserMapping *usermapping)
 	fdwent = hash_search(DistributedXactState.participants,
 						 (void *) &umid, HASH_ENTER, &found);
 
+	/* Already registered */
 	if (found)
+	{
+		/* Update statistics if necessary  */
+		if (fdwent->modified && !modified)
+			DistributedXactState.nparticipants_modified--;
+		else if (!fdwent->modified && modified)
+			DistributedXactState.nparticipants_modified++;
+
+		fdwent->modified = modified;
+
+		Assert(DistributedXactState.nparticipants_modified <=
+		   hash_get_num_entries(DistributedXactState.participants));
 		return;
+	}
 
 	/*
 	 * The participant information needs to live until the end of the transaction
@@ -332,6 +381,7 @@ FdwXactRegisterXact(UserMapping *usermapping)
 				(errmsg("cannot register foreign server not supporting transaction callback")));
 
 	fdwent->fdwxact = NULL;
+	fdwent->modified = modified;
 	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdwent->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -341,9 +391,13 @@ FdwXactRegisterXact(UserMapping *usermapping)
 	/* Update statistics */
 	if (!ServerSupportTwophaseCommit(fdwent))
 		DistributedXactState.nparticipants_no_twophase++;
+	if (fdwent->modified)
+		DistributedXactState.nparticipants_modified++;
 
 	Assert(DistributedXactState.nparticipants_no_twophase <=
 			   hash_get_num_entries(DistributedXactState.participants));
+	Assert(DistributedXactState.nparticipants_modified <=
+		   hash_get_num_entries(DistributedXactState.participants));
 }
 
 /* Remove the foreign transaction from the current participants */
@@ -372,9 +426,13 @@ RemoveFdwXactEntry(Oid umid)
 		/* Update statistics */
 		if (!ServerSupportTwophaseCommit(fdwent))
 			DistributedXactState.nparticipants_no_twophase--;
+		if (fdwent->modified)
+			DistributedXactState.nparticipants_modified--;
 
 		Assert(DistributedXactState.nparticipants_no_twophase <=
 			   hash_get_num_entries(DistributedXactState.participants));
+		Assert(DistributedXactState.nparticipants_modified <=
+			   hash_get_num_entries(DistributedXactState.participants));
 	}
 }
 
@@ -441,7 +499,9 @@ AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker)
 
 	/* Reset all fields */
 	DistributedXactState.all_prepared = false;
+	DistributedXactState.twophase_commit_required = false;
 	DistributedXactState.nparticipants_no_twophase = 0;
+	DistributedXactState.nparticipants_modified = 0;
 }
 
 /*
@@ -505,7 +565,7 @@ AtPrepare_FdwXact(void)
 	 * prepare all foreign transactions.
 	 */
 	xid = GetTopTransactionId();
-	FdwXactPrepareForeignTransactions(xid);
+	FdwXactPrepareForeignTransactions(xid, true);
 
 	/*
 	 * Remember we already prepared all participants.  We keep participants
@@ -524,6 +584,8 @@ PreCommit_FdwXact(bool is_parallel_worker)
 {
 	HASH_SEQ_STATUS scan;
 	FdwXactEntry *fdwent;
+	TransactionId xid;
+	bool		local_modified;
 
 	/*
 	 * If there is no foreign server involved or all foreign transactions
@@ -535,6 +597,40 @@ PreCommit_FdwXact(bool is_parallel_worker)
 
 	Assert(!RecoveryInProgress());
 
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Perform twophase commit if required. Note that we don't support foreign
+	 * twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+		FdwXactPrepareForeignTransactions(xid, false);
+		DistributedXactState.twophase_commit_required = true;
+
+		return;
+	}
+
 	/* Commit all foreign transactions in the participant list */
 	hash_seq_init(&scan, DistributedXactState.participants);
 	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
@@ -650,11 +746,70 @@ CheckPointFdwXacts(XLogRecPtr redo_horizon)
 							   serialized_fdwxacts)));
 }
 
+/* Return true if the current transaction needs to use two-phase commit */
+bool
+FdwXactIsForeignTwophaseCommitRequired(void)
+{
+	return DistributedXactState.twophase_commit_required;
+}
+
 /*
- * Insert FdwXactState entries and prepare foreign transactions.
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	int		nserverswritten;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	nserverswritten = DistributedXactState.nparticipants_modified;
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performing
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	if (DistributedXactState.nparticipants_no_twophase > 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	return true;
+}
+
+/*
+ * Insert FdwXactState entries and prepare foreign transactions.  If prepare_all is
+ * true, we prepare all foreign transaction regardless of writes having happened
+ * on the server.
+ *
+ * We still can change to rollback here on failure. If any error occurs, we
+ * rollback non-prepared foreign transactions.
  */
 static void
-FdwXactPrepareForeignTransactions(TransactionId xid)
+FdwXactPrepareForeignTransactions(TransactionId xid, bool prepare_all)
 {
 	FdwXactEntry *fdwent;
 	HASH_SEQ_STATUS scan;
@@ -673,6 +828,9 @@ FdwXactPrepareForeignTransactions(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdwent->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		identifier = getFdwXactIdentifier(fdwent, xid);
 		Assert(identifier);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bfe5e11245..cec4813f37 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -22,6 +22,7 @@
 
 #include "access/commit_ts.h"
 #include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -1456,6 +1457,9 @@ RecordTransactionCommit(void)
 	if (wrote_xlog && markXidCommitted)
 		SyncRepWaitForLSN(XactLastRecEnd, true);
 
+	if (FdwXactIsForeignTwophaseCommitRequired())
+		FdwXactLaunchOrWakeupResolver();
+
 	/* remember end of last commit record */
 	XactLastCommitEnd = XactLastRecEnd;
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 853ffb8f97..a79fb503e8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -523,6 +523,24 @@ static struct config_enum_entry default_toast_compression_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4798,6 +4816,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d5abe7d4a7..89738b6632 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -754,6 +754,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 05fc1beb2e..0d71e48a59 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -22,6 +22,14 @@
 											 * without preparation */
 #define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -103,6 +111,7 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern void PreCommit_FdwXact(bool is_parallel_worker);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index aa93ddc7ae..55298f8dae 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -303,7 +303,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in fdwxact/fdwxact.c */
-extern void FdwXactRegisterXact(UserMapping *usermapping);
+extern void FdwXactRegisterXact(UserMapping *usermapping, bool modified);
 extern void FdwXactUnregisterXact(UserMapping *usermapping);
 
 #endif							/* FDWAPI_H */
-- 
2.24.3 (Apple Git-128)

v36-0004-postgres_fdw-supports-prepare-API.patchapplication/octet-stream; name=v36-0004-postgres_fdw-supports-prepare-API.patchDownload
From 8c79136dc5cca0d79e8c3fb5819bf56022c1f7cb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v36 4/9] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 136 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 134 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 321cd2f319..2b0ff22370 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -108,6 +108,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 static bool disconnect_cached_connections(Oid serverid);
 
 /*
@@ -1467,12 +1469,19 @@ void
 postgresCommitForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping, finfo->identifier,
+								true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1514,11 +1523,19 @@ void
 postgresRollbackForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping, finfo->identifier,
+								false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1588,6 +1605,46 @@ cleanup:
 	pgfdw_cleanup_after_transaction(entry);
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", finfo->identifier);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data, NULL);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   finfo->server->servername, finfo->identifier)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 finfo->server->servername, finfo->identifier);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1620,3 +1677,74 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data, NULL);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback", fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2fd4a87f5c..27e2164ef4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9185,19 +9185,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 36d7ad3e0b..24aed7ae1d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -615,6 +615,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 09d2806618..97e4f244db 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -154,6 +154,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
 extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
+extern void postgresPrepareForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 000e2534fc..d85b69c736 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2781,13 +2781,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.24.3 (Apple Git-128)

v36-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v36-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 79bbd4d13e8bce9bf4c02b47c4c3756739b2d7fb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 10 May 2021 20:31:30 +0900
Subject: [PATCH v36 2/9] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 459 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 225 insertions(+), 243 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 82aa14a65d..321cd2f319 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
 #include "funcapi.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,8 +93,7 @@ static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -106,6 +106,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static bool disconnect_cached_connections(Oid serverid);
 
 /*
@@ -124,53 +126,14 @@ static bool disconnect_cached_connections(Oid serverid);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
 	/* Set flag that we did GetConnection during the current transaction */
 	xact_got_connection = true;
 
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -205,7 +168,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 		if (entry->state.pendingAreq)
 			process_pending_request(entry->state.pendingAreq);
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -266,7 +229,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -279,6 +242,54 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 	return entry->conn;
 }
 
+/* Return ConnCacheEntry identified by the given umid */
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -591,7 +602,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -603,6 +614,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterXact(user);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -822,203 +836,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-						/* Also reset per-connection state */
-						memset(&entry->state, 0, sizeof(entry->state));
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, it is marked as
-		 * invalid or keep_connections option of its server is disabled, then
-		 * discard it to recover. Next GetConnection will open a new
-		 * connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state ||
-			entry->invalidated ||
-			!entry->keep_connections)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1645,3 +1462,161 @@ disconnect_cached_connections(Oid serverid)
 
 	return result;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	do_sql_command(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+		goto cleanup;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+		goto cleanup;
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+cleanup:
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state ||
+		entry->invalidated ||
+		!entry->keep_connections)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+   xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6f533c745d..7986c87b60 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9195,7 +9195,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 4ff58d9c27..36d7ad3e0b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -612,6 +612,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->ForeignAsyncConfigureWait = postgresForeignAsyncConfigureWait;
 	routine->ForeignAsyncNotify = postgresForeignAsyncNotify;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 9591c0f6c2..09d2806618 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -151,6 +152,8 @@ extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query,
 								  PgFdwConnState *state);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
+extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.24.3 (Apple Git-128)

v36-0003-Support-two-phase-commit-for-foreign-transaction.patchapplication/octet-stream; name=v36-0003-Support-two-phase-commit-for-foreign-transaction.patchDownload
From ecfbfa8eae5e8fbb656d4cb418c949ab9f3df4be Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 10 May 2021 20:32:25 +0900
Subject: [PATCH v36 3/9] Support two-phase commit for foreign transactions.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   59 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    6 +-
 src/backend/access/transam/Makefile           |    2 +
 src/backend/access/transam/fdwxact.c          | 1811 ++++++++++++++++-
 src/backend/access/transam/fdwxact_launcher.c |  557 +++++
 src/backend/access/transam/fdwxact_resolver.c |  314 +++
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   44 +
 src/backend/access/transam/xact.c             |    4 +-
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/dependency.c              |    5 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/foreigncmds.c            |   34 +-
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/postmaster.c           |   13 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   42 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/tcop/postgres.c                   |   14 +
 src/backend/utils/activity/wait_event.c       |   15 +
 src/backend/utils/misc/guc.c                  |   48 +
 src/backend/utils/misc/postgresql.conf.sample |   14 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   91 +-
 src/include/access/fdwxact_launcher.h         |   27 +
 src/include/access/fdwxact_resolver.h         |   22 +
 src/include/access/fdwxact_xlog.h             |   49 +
 src/include/access/resolver_internal.h        |   61 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/commands/defrem.h                 |    1 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/storage/procarray.h               |    1 +
 src/include/utils/guc_tables.h                |    2 +
 src/include/utils/wait_event.h                |    7 +-
 src/test/regress/expected/rules.out           |    7 +
 46 files changed, 3297 insertions(+), 59 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/backend/access/transam/fdwxact_launcher.c
 create mode 100644 src/backend/access/transam/fdwxact_resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7986c87b60..2fd4a87f5c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9195,7 +9195,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..4e97486640
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		rmgr descriptor routines for access/transam/fdwxact.c
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactStateOnDiskData *fdwxact_insert = (FdwXactStateOnDiskData *) rec;
+
+		appendStringInfo(buf, "xid: %u, dbid: %u, umid: %u, serverid: %u, owner: %u, identifier: %s",
+						 fdwxact_insert->xid,
+						 fdwxact_insert->dbid,
+						 fdwxact_insert->umid,
+						 fdwxact_insert->serverid,
+						 fdwxact_insert->owner,
+						 fdwxact_insert->identifier);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "xid: %u, umid: %u",
+						 fdwxact_remove->xid,
+						 fdwxact_remove->umid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index e6090a9dad..72336fdc8c 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -113,7 +113,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s "
+						 "max_prepared_foreign_transactions=%d",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
@@ -121,7 +122,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
-						 xlrec.track_commit_timestamp ? "on" : "off");
+						 xlrec.track_commit_timestamp ? "on" : "off",
+						 xlrec.max_prepared_foreign_xacts);
 	}
 	else if (info == XLOG_FPW_CHANGE)
 	{
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index b05a88549d..26a5ee589c 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -16,6 +16,8 @@ OBJS = \
 	clog.o \
 	commit_ts.o \
 	fdwxact.o \
+	fdwxact_launcher.o \
+	fdwxact_resolver.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 3a4118caec..8100d4dd1d 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -13,6 +13,57 @@
  * transaction manager calls corresponding FDW API to end the foreign
  * tranasctions.
  *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol. We WAL log the foreign transaction state so foreign transaction state
+ * is crash-safe.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API for each foreign transaction regardless of data on
+ * the foreign server having been modified.  At COMMIT PREPARED and ROLLBACK PREPARED,
+ * we commit or rollback only the local transaction but not do anything for involved
+ * foreign transactions.  The prepared foreign transactinos are resolved by a resolver
+ * process asynchronously.  Also, users can use pg_resolve_foreign_xact() SQL function
+ * that resolve a foreign transaction manually.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXactState
+ * entry is updated. To avoid holding the lock during transaction processing
+ * which may take an unpredictable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * A process who is going to work on the foreign transaction needs to set
+ *	 locking_backend of the FdwXactState entry, which prevents the entry from being
+ *	 updated and removed by concurrent processes.
+ * * All FdwXactState fields except for status are protected by FdwXactLock.  The
+ *   status is protected by its mutex.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXactState
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXactState file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->xacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->xacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->xacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
  * Portions Copyright (c) 2021, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -21,15 +72,41 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/twophase.h"
+#include "access/resolver_internal.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_user_mapping.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/syscache.h"
 
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
 /* Initial size of the hash table */
 #define FDWXACT_HASH_SIZE	64
 
@@ -37,6 +114,23 @@
 #define ServerSupportTransactionCallback(fdwent) \
 	(((FdwXactEntry *)(fdwent))->commit_foreign_xact_fn != NULL)
 
+/* Check the FdwXactEntry is capable of two-phase commit  */
+#define ServerSupportTwophaseCommit(fdwent) \
+	(((FdwXactEntry *)(fdwent))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes xid and
+ * user mapping OID separated by '_'.
+ *
+ * Since FdwXactState is identified by user mapping OID and it's unique
+ * within a distributed transaction, the name is fairly enough to
+ * ensure uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8)
+#define FdwXactStateFilePath(path, xid, umid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X", \
+			 xid, umid)
+
 /*
  * Structure to bundle the foreign transaction participant.
  *
@@ -51,26 +145,139 @@ typedef struct FdwXactEntry
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/*
+	 * Pointer to a FdwXactState entry in the global array. NULL if the entry is
+	 * not inserted yet but this is registered as a participant.
+	 */
+	FdwXactState		fdwxact;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactEntry;
 
 /*
- * Foreign transactions involved in the current transaction.  A member of
- * participants must support both commit and rollback APIs
- * (ServerSupportTransactionCallback() is true).
+ * The current distributed transaction state. Members of participants must
+ * support at least both commit and rollback APIs
+ * (ServerSupportTransactionCallback() is true)..
  */
-static HTAB *FdwXactParticipants = NULL;
+typedef struct DistributedXactStateData
+{
+	bool	all_prepared; /* all participants are prepared? */
+
+	/* Statistics of participants */
+	int		nparticipants_no_twophase; /* how many participants doesn't support
+										* two-phase commit protocol? */
+	HTAB	*participants;
+} DistributedXactStateData;
+static DistributedXactStateData DistributedXactState = {
+	.all_prepared = false,
+	.nparticipants_no_twophase = 0,
+	.participants = NULL,
+};
 
 /* Check the current transaction has at least one fdwxact participant */
 #define HasFdwXactParticipant() \
-	(FdwXactParticipants != NULL && \
-	 hash_get_num_entries(FdwXactParticipants) > 0)
+	(DistributedXactState.participants != NULL && \
+	 hash_get_num_entries(DistributedXactState.participants) > 0)
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
 
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
+
+static void RemoveFdwXactEntry(Oid umid);
 static void EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit,
 							bool is_parallel_worker);
-static void RemoveFdwXactEntry(Oid umid);
+static char *getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid);
+static int ForgetAllParticipants(void);
+
+static void FdwXactPrepareForeignTransactions(TransactionId xid);
+static FdwXactState FdwXactInsertEntry(TransactionId xid, FdwXactEntry *fdwent,
+									   char *identifier);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(TransactionId xid, Oid umid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(TransactionId xid, Oid umid,
+								  XLogRecPtr insert_start_lsn, bool fromdisk);
+static char *ReadFdwXactStateFile(TransactionId xid, Oid umid);
+static void RemoveFdwXactStateFile(TransactionId xid, Oid umid, bool giveWarning);
+
+static FdwXactState insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid,
+							  Oid owner, char *identifier);
+static void remove_fdwxact(FdwXactState fdwxact);
+static FdwXactState get_fdwxact(TransactionId xid, Oid umid);
+static FdwXactState get_fdwxact_with_check(TransactionId xid, Oid umid,
+										   bool check_two_phase);
+static void pg_foreign_xact_callback(int code, Datum arg);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactState)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactStateData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXactState		fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_xacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXactState)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, xacts) +
+					  sizeof(FdwXactState) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction identified by the given user
@@ -87,20 +294,21 @@ FdwXactRegisterXact(UserMapping *usermapping)
 
 	Assert(IsTransactionState());
 
-	if (FdwXactParticipants == NULL)
+	if (DistributedXactState.participants == NULL)
 	{
 		HASHCTL	ctl;
 
 		ctl.keysize = sizeof(Oid);
 		ctl.entrysize = sizeof(FdwXactEntry);
 
-		FdwXactParticipants = hash_create("fdw xact participants",
-										  FDWXACT_HASH_SIZE,
-										  &ctl, HASH_ELEM | HASH_BLOBS);
+		DistributedXactState.participants = hash_create("fdw xact participants",
+															FDWXACT_HASH_SIZE,
+															&ctl, HASH_ELEM | HASH_BLOBS);
 	}
 
 	umid = usermapping->umid;
-	fdwent = hash_search(FdwXactParticipants, (void *) &umid, HASH_ENTER, &found);
+	fdwent = hash_search(DistributedXactState.participants,
+						 (void *) &umid, HASH_ENTER, &found);
 
 	if (found)
 		return;
@@ -123,13 +331,22 @@ FdwXactRegisterXact(UserMapping *usermapping)
 		ereport(ERROR,
 				(errmsg("cannot register foreign server not supporting transaction callback")));
 
+	fdwent->fdwxact = NULL;
 	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdwent->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	MemoryContextSwitchTo(old_ctx);
+
+	/* Update statistics */
+	if (!ServerSupportTwophaseCommit(fdwent))
+		DistributedXactState.nparticipants_no_twophase++;
+
+	Assert(DistributedXactState.nparticipants_no_twophase <=
+			   hash_get_num_entries(DistributedXactState.participants));
 }
 
-/* Remove the foreign transaction from FdwXactParticipants */
+/* Remove the foreign transaction from the current participants */
 void
 FdwXactUnregisterXact(UserMapping *usermapping)
 {
@@ -144,7 +361,21 @@ FdwXactUnregisterXact(UserMapping *usermapping)
 static void
 RemoveFdwXactEntry(Oid umid)
 {
-	(void) hash_search(FdwXactParticipants, (void *) &umid, HASH_REMOVE, NULL);
+	FdwXactEntry	*fdwent;
+
+	Assert(DistributedXactState.participants != NULL);
+	fdwent = hash_search(DistributedXactState.participants, (void *) &umid,
+						 HASH_REMOVE, NULL);
+
+	if (fdwent)
+	{
+		/* Update statistics */
+		if (!ServerSupportTwophaseCommit(fdwent))
+			DistributedXactState.nparticipants_no_twophase--;
+
+		Assert(DistributedXactState.nparticipants_no_twophase <=
+			   hash_get_num_entries(DistributedXactState.participants));
+	}
 }
 
 /*
@@ -153,29 +384,64 @@ RemoveFdwXactEntry(Oid umid)
 void
 AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker)
 {
-	FdwXactEntry *fdwent;
-	HASH_SEQ_STATUS scan;
-
 	/* If there are no foreign servers involved, we have no business here */
 	if (!HasFdwXactParticipant())
 		return;
 
-	hash_seq_init(&scan, FdwXactParticipants);
-	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	Assert(!RecoveryInProgress());
+
+	if (!isCommit)
 	{
-		Assert(ServerSupportTransactionCallback(fdwent));
+		HASH_SEQ_STATUS scan;
+		FdwXactEntry *fdwent;
 
-		/* Commit or rollback foreign transaction */
-		EndFdwXactEntry(fdwent, isCommit, is_parallel_worker);
+		/* Rollback foreign transactions in the participant list */
+		hash_seq_init(&scan, DistributedXactState.participants);
+		while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+		{
+			FdwXactState	fdwxact = fdwent->fdwxact;
+			int	status;
 
-		/*
-		 * Remove the entry so that we don't recursively process this foreign
-		 * transaction.
-		 */
-		RemoveFdwXactEntry(fdwent->umid);
+			/*
+			 * If this foreign transaction is not prepared yet, end the foreign
+			 * transaction in one-phase.
+			 */
+			if (!fdwxact)
+			{
+				Assert(ServerSupportTransactionCallback(fdwent));
+				EndFdwXactEntry(fdwent, false, is_parallel_worker);
+
+				/*
+				 * Remove FdwXactState entry to prevent processing again in a recursive
+				 * error case.
+				 */
+				RemoveFdwXactEntry(fdwent->umid);
+				continue;
+			}
+
+			/*
+			 * If the foreign transaction has FdwXactState entry, the foreign transaction
+			 * might have been prepared.  We rollback the foreign transaction anyway
+			 * to end the current transaction if the status is in-progress.  Since the
+			 * transaction might have been already prepared on the foreign we set the
+			 * status to aborting and leave it.
+			 */
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				EndFdwXactEntry(fdwent, isCommit, is_parallel_worker);
+		}
 	}
 
-	Assert(!HasFdwXactParticipant());
+	if (ForgetAllParticipants() > 0)
+		FdwXactLaunchOrWakeupResolver();
+
+	/* Reset all fields */
+	DistributedXactState.all_prepared = false;
+	DistributedXactState.nparticipants_no_twophase = 0;
 }
 
 /*
@@ -192,6 +458,7 @@ EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit, bool is_parallel_worker)
 	finfo.usermapping = fdwent->usermapping;
 	finfo.flags = FDWXACT_FLAG_ONEPHASE |
 		((is_parallel_worker) ? FDWXACT_FLAG_PARALLEL_WORKER : 0);
+	finfo.identifier = NULL;
 
 	if (isCommit)
 	{
@@ -208,15 +475,1493 @@ EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit, bool is_parallel_worker)
 }
 
 /*
- * This function is called at PREPARE TRANSACTION.  Since we don't support
- * preparing foreign transactions yet, raise an error if the local transaction
- * has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * In case where an error happens during parparing a foreign transaction we
+ * change to rollback.  See AtEOXact_FdwXact() for details.
  */
 void
 AtPrepare_FdwXact(void)
 {
-	if (HasFdwXactParticipant())
+	TransactionId xid;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (!HasFdwXactParticipant())
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 *  prepare all of them.
+	 */
+	if (DistributedXactState.nparticipants_no_twophase > 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+				 errmsg("cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol")));
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is used to determine the result of the distributed transaction. And
+	 * prepare all foreign transactions.
+	 */
+	xid = GetTopTransactionId();
+	FdwXactPrepareForeignTransactions(xid);
+
+	/*
+	 * Remember we already prepared all participants.  We keep participants
+	 * until the transaction end so that we unlock the involved foreign transactions
+	 * to abort in case of failure.
+	 */
+	DistributedXactState.all_prepared = true;
+}
+
+/*
+ * Pre-commit processing for foreign transactions. We commit those foreign
+ * transactions with one-phase.
+ */
+void
+PreCommit_FdwXact(bool is_parallel_worker)
+{
+	HASH_SEQ_STATUS scan;
+	FdwXactEntry *fdwent;
+
+	/*
+	 * If there is no foreign server involved or all foreign transactions
+	 * are already prepared (see AtPrepare_FdwXact()), we have no business here.
+	 */
+	if (!HasFdwXactParticipant() ||
+		DistributedXactState.all_prepared)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit all foreign transactions in the participant list */
+	hash_seq_init(&scan, DistributedXactState.participants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		Assert(ServerSupportTransactionCallback(fdwent));
+
+		/*
+		 * Commit the foreign transaction and remove itself from the hash table
+		 * so that we don't try to abort already-closed transaction.
+		 */
+		EndFdwXactEntry(fdwent, true, is_parallel_worker);
+		RemoveFdwXactEntry(fdwent->umid);
+	}
+}
+
+/*
+ * Return true if there is a prepared foreign transaction which matches
+ * given arguments.
+ */
+bool
+FdwXactExists(TransactionId xid, Oid umid)
+{
+	FdwXactState fdwxact;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	fdwxact = get_fdwxact(xid, umid);
+	LWLockRelease(FdwXactLock);
+
+	return (fdwxact != NULL);
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXactStates that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXactStates that need to be copied to disk.
+ *
+ * If a FdwXactState remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts == 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXactState that need to be copied to
+	 * disk, so we perform all I/O while holding FdwXactLock for simplicity.
+	 * This presents any new foreign xacts from preparing while this occurs,
+	 * which shouldn't be a problem since the presence of long-lived prepared
+	 * foreign xacts indicated the transaction manager isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXactState with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_xacts; cnt++)
+	{
+		FdwXactState		fdwxact = FdwXactCtl->xacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->data.xid, fdwxact->data.umid, buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXactState files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Insert FdwXactState entries and prepare foreign transactions.
+ */
+static void
+FdwXactPrepareForeignTransactions(TransactionId xid)
+{
+	FdwXactEntry *fdwent;
+	HASH_SEQ_STATUS scan;
+
+	Assert(TransactionIdIsValid(xid));
+
+	/* Loop over the foreign connections */
+	hash_seq_init(&scan, DistributedXactState.participants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		FdwXactInfo finfo;
+		FdwXactState		fdwxact;
+		char		*identifier;
+
+		Assert(ServerSupportTwophaseCommit(fdwent));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		identifier = getFdwXactIdentifier(fdwent, xid);
+		Assert(identifier);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 *
+		 * If we prepare the transaction on the foreign server before
+		 * persisting the information to the disk and crash in-between these
+		 * two steps, we will lost the prepared transaction on the foreign
+		 * server and will not be able to resolve it after the crash recovery.
+		 * Hence persist first then prepare.
+		 */
+		fdwxact = FdwXactInsertEntry(xid, fdwent, identifier);
+
+		/*
+		 * Prepare the foreign transaction.  Between FdwXactInsertEntry call till
+		 * this backend hears acknowledge from foreign server, the backend may
+		 * abort the local transaction (say, because of a signal).
+		 */
+		finfo.server = fdwent->server;
+		finfo.usermapping = fdwent->usermapping;
+		finfo.flags = 0;
+		finfo.identifier = identifier;
+		fdwent->prepare_foreign_xact_fn(&finfo);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<umid> whose length is less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
+			 xid, fdwent->umid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function insert a new FdwXactState entry to the global array with
+ * WAL-logging. The new entry is held by the backend who inserted.
+ */
+static FdwXactState
+FdwXactInsertEntry(TransactionId xid, FdwXactEntry *fdwent,
+				   char *identifier)
+{
+	FdwXactStateOnDiskData *fdwxact_file_data;
+	FdwXactState		fdwxact;
+	Oid			owner;
+	int			data_len;
+
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	/*
+	 * Enter the foreign transaction into the shared memory structure.
+	 */
+	owner = GetUserId();
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdwent->umid,
+							 fdwent->usermapping->serverid, owner, identifier);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	fdwent->fdwxact = fdwxact;
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactStateOnDiskData, identifier);
+	data_len = data_len + strlen(identifier) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactStateOnDiskData *) palloc0(data_len);
+	memcpy(fdwxact_file_data, &(fdwxact->data), data_len);
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	START_CRIT_SECTION();
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+	XLogFlush(fdwxact->insert_end_lsn);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXactState
+insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid, Oid owner,
+			   char *identifier)
+{
+	FdwXactState		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		fdwxact = FdwXactCtl->xacts[i];
+		if (fdwxact->valid &&
+			fdwxact->data.xid == xid &&
+			fdwxact->data.umid == umid)
+			ereport(ERROR,
+					(errmsg("could not insert a foreign transaction entry"),
+					 errdetail("Duplicate entry with transaction id %u, user mapping id %u exists.",
+							   xid, umid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_xacts < max_prepared_foreign_xacts);
+	FdwXactCtl->xacts[FdwXactCtl->num_xacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->data.xid = xid;
+	fdwxact->data.dbid = dbid;
+	fdwxact->data.umid = umid;
+	fdwxact->data.serverid = serverid;
+	fdwxact->data.owner = owner;
+	strlcpy(fdwxact->data.identifier, identifier, FDWXACT_ID_MAX_LEN);
+
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXactState fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		if (FdwXactCtl->xacts[i] == fdwxact)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_xacts)
+		elog(ERROR, "failed to find %p in FdwXactState array", fdwxact);
+
+	elog(DEBUG2, "remove fdwxact entry id %s", fdwxact->data.identifier);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_xacts--;
+	FdwXactCtl->xacts[i] = FdwXactCtl->xacts[FdwXactCtl->num_xacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.xid = fdwxact->data.xid;
+		record.umid = fdwxact->data.umid;
+
+		/*
+		 * Now writing FdwXactState data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+}
+
+/*
+ * When the process exits, forget all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	if (ForgetAllParticipants() > 0)
+		FdwXactLaunchOrWakeupResolver();
+}
+
+/*
+ * Unlock all foreign transaction participants.  If we left foreign transaction,
+ * update the oldest xmin of unresolved transaction to prevent the local
+ * transaction id of such unresolved foreign transaction from begin truncated.
+ * Returns the number of remaining foreign transactions.
+ */
+static int
+ForgetAllParticipants(void)
+{
+	FdwXactEntry *fdwent;
+	HASH_SEQ_STATUS scan;
+	int	nremaining = 0;
+
+	if (!HasFdwXactParticipant())
+		return nremaining;
+
+	hash_seq_init(&scan, DistributedXactState.participants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		FdwXactState		fdwxact = fdwent->fdwxact;
+
+		if (fdwxact)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+
+			/* Unlock the foreign transaction entry */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->locking_backend = InvalidBackendId;
+			LWLockRelease(FdwXactLock);
+
+			nremaining++;
+		}
+
+		/* Remove from the participants list */
+		RemoveFdwXactEntry(fdwent->umid);
+	}
+
+	/*
+	 * If we leave any FdwXactState entries, update the oldest local transaction of
+	 * unresolved distributed transaction.
+	 */
+	if (nremaining > 0)
+	{
+		elog(DEBUG1, "%u foreign transactions remaining", nremaining);
+		FdwXactComputeRequiredXmin();
+	}
+
+	Assert(!HasFdwXactParticipant());
+	return nremaining;
+}
+
+/*
+ * Commit or rollback one prepared foreign transaction, and remove FdwXactState
+ * entry.
+ */
+void
+ResolveOneFdwXact(FdwXactState fdwxact)
+{
+	FdwXactInfo finfo;
+	FdwRoutine *routine;
+
+	/* The FdwXactState entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+	Assert(fdwxact->status == FDWXACT_STATUS_PREPARED ||
+		   fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+		   fdwxact->status == FDWXACT_STATUS_ABORTING);
+
+	/* Set whether we do commit or abort if not set yet */
+	if (fdwxact->status == FDWXACT_STATUS_PREPARED)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->data.xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	routine = GetFdwRoutineByServerId(fdwxact->data.serverid);
+
+	/* Prepare the foreign transaction information to pass to API */
+	finfo.server = GetForeignServer(fdwxact->data.serverid);
+	finfo.usermapping = GetUserMapping(fdwxact->data.owner, fdwxact->data.serverid);
+	finfo.flags = 0;
+	finfo.identifier = fdwxact->data.identifier;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction %s",
+			 fdwxact->data.identifier);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction %s",
+			 fdwxact->data.identifier);
+	}
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	if (fdwxact->ondisk)
+		RemoveFdwXactStateFile(fdwxact->data.xid, fdwxact->data.umid, true);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState		fdwxact = FdwXactCtl->xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->data.xid));
+
+		/*
+		 * We can exclude entries that are marked as either committing or
+		 * aborting and its state file is on disk since such entries
+		 * no longer need to lookup its transaction status from the commit
+		 * log.
+		 */
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->data.xid, agg_xmin) ||
+			(fdwxact->ondisk &&
+			 (fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+			  fdwxact->status == FDWXACT_STATUS_ABORTING)))
+			agg_xmin = fdwxact->data.xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	elog(ERROR,
+		 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactStateFilePath(path, xid, umid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXactState entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record), record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXactState entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->xid, record->umid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+/*
+ * Scan the shared memory entries of FdwXactState and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState		fdwxact = FdwXactCtl->xacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->data.xid, fdwxact->data.umid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->data.xid, result))
+			result = fdwxact->data.xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXactState depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXactState files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		  *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId xid;
+			Oid		   umid;
+			char		  *buf;
+
+			sscanf(clde->d_name, "%08x_%08x", &xid, &umid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(xid, umid, InvalidXLogRecPtr,
+									   true);
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Scan the shared memory entries of FdwXactState and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState		fdwxact = FdwXactCtl->xacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->data.xid, fdwxact->data.umid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %s from shared memory",
+						fdwxact->data.identifier)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactStateOnDiskData *fdwxact_data = (FdwXactStateOnDiskData *) buf;
+	FdwXactState		fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->xid,
+							 fdwxact_data->umid, fdwxact_data->serverid,
+							 fdwxact_data->owner, fdwxact_data->identifier);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u user mapping %u owner %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->xid,
+		 fdwxact_data->umid, fdwxact_data->owner,
+		 fdwxact_data->identifier);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXactState file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXactState entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(TransactionId xid, Oid umid, bool givewarning)
+{
+	FdwXactState		fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		fdwxact = FdwXactCtl->xacts[i];
+
+		if (fdwxact->data.xid == xid && fdwxact->data.umid == umid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_xacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactStateFile(fdwxact->data.xid, fdwxact->data.umid, givewarning);
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction %s",
+		 fdwxact->data.identifier);
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactStateFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	TimeLineID	save_currtli = ThisTimeLineID;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	/*
+	 * Restore immediately the timeline where it was previously, as
+	 * read_local_xlog_page() could have changed it if the record was read
+	 * while recovery was finishing or if the timeline has jumped in-between.
+	 */
+	ThisTimeLineID = save_currtli;
+
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(TransactionId xid, Oid umid, XLogRecPtr insert_start_lsn,
+					 bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u and user mapping %u",
+							xid, umid)));
+			RemoveFdwXactStateFile(xid, umid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u and user mapping %u",
+							xid, umid)));
+			FdwXactRedoRemove(xid, umid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactStateFile(xid, umid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactStateFile(TransactionId xid, Oid umid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactStateOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactStateFilePath(path, xid, umid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactStateOnDiskData, identifier) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactStateOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactStateOnDiskData *) buf;
+	if (fdwxact_file_data->xid != xid ||
+		fdwxact_file_data->umid != umid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactStateFile(TransactionId xid, Oid umid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactStateFilePath(path, xid, umid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Return the FdwXactState that matched to given arguments. Otherwise return NULL.
+ * The search condition is defined by arguments with valid values for respective
+ * datatypes. The caller must hold FdwXactLock.
+ */
+static FdwXactState
+get_fdwxact(TransactionId xid, Oid umid)
+{
+	FdwXactState fdwxact;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		fdwxact = FdwXactCtl->xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid != fdwxact->data.xid)
+			continue;
+
+		/* umid */
+		if (OidIsValid(umid) && umid != fdwxact->data.umid)
+			continue;
+
+		/* This entry matches the condition */
+		return fdwxact;
+	}
+
+	return NULL;
+}
+
+/*
+ * Get FdwXact entry and do some sanity checks. If check_two_phase is true, we also
+ * check if the given xid is prepared.  The caller must hold FdwXactLock.
+ */
+static FdwXactState
+get_fdwxact_with_check(TransactionId xid, Oid umid, bool check_two_phase)
+{
+	FdwXactState		fdwxact;
+	Oid			myuserid;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	fdwxact = get_fdwxact(xid, umid);
+
+	if (fdwxact->data.dbid != MyDatabaseId)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction belongs to another database"),
+				 errhint("Connect to the database where the transaction was created to finish it.")));
+
+	/* permission check */
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->data.owner && !superuser_arg(myuserid))
+		ereport(ERROR,
+				 (errmsg("permission denied to resolve prepared foreign transaction"),
+				  errhint("Must be superuser or the user that prepared the transaction")));
+
+	/* check if the entry is being processed by someone */
+	if (fdwxact->locking_backend != InvalidBackendId)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->data.identifier)));
+
+	if (check_two_phase && TwoPhaseExists(fdwxact->data.xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->data.identifier),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	return fdwxact;
+}
+
+/* Error cleanup callback for pg_foreign_resolve/remove_xact */
+static void
+pg_foreign_xact_callback(int code, Datum arg)
+{
+	FdwXactState fdwxact = (FdwXactState) DatumGetPointer(arg);
+
+	if (fdwxact->valid)
+	{
+		Assert(fdwxact->locking_backend == MyBackendId);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXactState		fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	6
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState		fdwxact = FdwXactCtl->xacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->data.xid);
+		values[1] = ObjectIdGetDatum(fdwxact->data.umid);
+		values[2] = ObjectIdGetDatum(fdwxact->data.owner);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[3] = CStringGetTextDatum(xact_status);
+		values[4] = CStringGetTextDatum(fdwxact->data.identifier);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC *locker = BackendIdGetProc(fdwxact->locking_backend);
+			values[5] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[5] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			umid = PG_GETARG_OID(1);
+	FdwXactState	fdwxact;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_fdwxact_with_check(xid, umid, true);
+
+	/* lock it */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Resolve the foreign transaction.  We ensure unlocking FdwXact entry
+	 * at an error or an interruption.
+	 *
+	 * XXX we assume that an interruption doesn't happen between locking
+	 * FdwXact entry and registering the callback, especially in
+	 * LWLockRelease().
+	 */
+	PG_ENSURE_ERROR_CLEANUP(pg_foreign_xact_callback,
+							(Datum) PointerGetDatum(fdwxact));
+	{
+		ResolveOneFdwXact(fdwxact);
+	}
+	PG_END_ENSURE_ERROR_CLEANUP(pg_foreign_xact_callback,
+								(Datum) PointerGetDatum(fdwxact));
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid		umid = PG_GETARG_OID(1);
+	FdwXactState	fdwxact;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_fdwxact_with_check(xid, umid, false);
+
+	/* Clean up entry and any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactStateFile(fdwxact->data.xid, fdwxact->data.umid, true);
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/transam/fdwxact_launcher.c b/src/backend/access/transam/fdwxact_launcher.c
new file mode 100644
index 0000000000..037c2aa93e
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_launcher.c
@@ -0,0 +1,557 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when arrived a requested by backend process.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+static volatile sig_atomic_t got_SIGUSR2 = false;
+
+static void FdwXactLauncherOnExit(int code, Datum arg);
+static void FdwXactLaunchResolver(Oid dbid);
+static bool FdwXactRelaunchResolvers(void);
+
+/* Signal handler */
+static void FdwXactLaunchHandler(SIGNAL_ARGS);
+
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+FdwXactRequestToLaunchResolver(void)
+{
+	if (FdwXactResolverCtl->launcher_pid != InvalidPid)
+		kill(FdwXactResolverCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactLauncherShmemInit */
+Size
+FdwXactLauncherShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactResolverCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactLauncherShmemInit(void)
+{
+	bool		found;
+
+	FdwXactResolverCtl = ShmemInitStruct("Foreign Transaction Launcher Data",
+										 FdwXactLauncherShmemSize(),
+										 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactResolverCtl, 0, FdwXactLauncherShmemSize());
+		SHMQueueInit(&(FdwXactResolverCtl->fdwxact_queue));
+		FdwXactResolverCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+FdwXactLauncherOnExit(int code, Datum arg)
+{
+	FdwXactResolverCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+FdwXactLaunchHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(FdwXactLauncherOnExit, (Datum) 0);
+
+	Assert(FdwXactResolverCtl->launcher_pid == InvalidPid);
+	FdwXactResolverCtl->launcher_pid = MyProcPid;
+	FdwXactResolverCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGUSR2, FdwXactLaunchHandler);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to
+		 * start when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/*
+			 * Launch foreign transaction resolvers that are requested but not
+			 * running.
+			 */
+			launched = FdwXactRelaunchResolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request launcher to launch a new foreign transaction resolver process
+ * or wake up the resolver if it's already running.
+ */
+void
+FdwXactLaunchOrWakeupResolver(void)
+{
+	volatile FdwXactResolver *resolver;
+	bool		found = false;
+
+	/*
+	 * Looking for a resolver process that is running and working on the same
+	 * database.
+	 */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId)
+		{
+			found = true;
+			break;
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (found)
+	{
+		/* Found the running resolver */
+		elog(DEBUG1,
+			 "found a running foreign transaction resolver process for database %u",
+			 MyDatabaseId);
+
+		/*
+		 * Wakeup the resolver. It's possible that the resolver is starting up
+		 * and doesn't attach its slot yet. Since the resolver will find
+		 * FdwXact entry we inserted soon we don't anything.
+		 */
+		if (resolver->latch)
+			SetLatch(resolver->latch);
+
+		return;
+	}
+
+	/* Otherwise wake up the launcher to launch new resolver */
+	FdwXactRequestToLaunchResolver();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid'.
+ */
+static void
+FdwXactLaunchResolver(Oid dbid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactResolverCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for database %u", resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Launch or relaunch foreign transaction resolvers on database that has
+ * at least one FdwXact entry but no resolver is running on it.
+ */
+static bool
+FdwXactRelaunchResolvers(void)
+{
+	HTAB	   *fdwxact_dbs;
+	HTAB	   *resolver_dbs;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	Oid		   *entry;
+	bool		launched;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+
+	/*
+	 * Create a hash map for the database that has at least one foreign
+	 * transaction to resolve.
+	 */
+	fdwxact_dbs = hash_create("fdwxact dblist",
+							  32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids that has at least one FdwXact entry to resolve */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState		fdwxact = FdwXactCtl->xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction
+		 * is not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->data.xid))
+			hash_search(fdwxact_dbs, &(fdwxact->data.dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(fdwxact_dbs) == 0)
+	{
+		hash_destroy(fdwxact_dbs);
+		return false;
+	}
+
+	/* Create a hash map for databases on which a resolver is running */
+	resolver_dbs = hash_create("resolver dblist",
+							   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	/* Collect database oids on which resolvers are running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (!resolver->in_use)
+			continue;
+
+		hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Find databases on which no resolver is running and launch new
+	 * resolver process on them.
+	 */
+	hash_seq_init(&status, fdwxact_dbs);
+	while ((entry = (Oid *) hash_seq_search(&status)) != NULL)
+	{
+		bool		found;
+
+		hash_search(resolver_dbs, entry, HASH_FIND, &found);
+
+		if (!found)
+		{
+			/* No resolver is running on this database, launch new one */
+			FdwXactLaunchResolver(*entry);
+			launched = true;
+		}
+	}
+
+	hash_destroy(fdwxact_dbs);
+	hash_destroy(resolver_dbs);
+
+	return launched;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactResolverCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Stop the fdwxact resolver running on the given database.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	Oid			dbid = PG_GETARG_OID(0);
+	FdwXactResolver *resolver = NULL;
+	int			i;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	if (!OidIsValid(dbid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid database id")));
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	/* Find the running resolver process on the given database */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		resolver = &FdwXactResolverCtl->resolvers[i];
+
+		/* found! */
+		if (resolver->in_use && resolver->dbid == dbid)
+			break;
+	}
+
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on database %d",
+						dbid)));
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/transam/fdwxact_resolver.c b/src/backend/access/transam/fdwxact_resolver.c
new file mode 100644
index 0000000000..3e0510cf3a
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_resolver.c
@@ -0,0 +1,314 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves in-doubt
+ * foreign transactions, foreign transactions participate to a distributed
+ * transaction but aren't being processed anyone.  A resolver process is
+ * launched per database by foreign transaction launcher.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int	foreign_xact_resolution_retry_interval;
+int	foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactResolverCtlData *FdwXactResolverCtl;
+
+static void FdwXactResolverLoop(void);
+static long FdwXactResolverComputeSleepTime(TimestampTz now);
+static void FdwXactResolverCheckTimeout(TimestampTz now);
+
+static void FdwXactResolverOnExit(int code, Datum arg);
+static void FdwXactResolverDetach(void);
+static void FdwXactResolverAttach(int slot);
+static void FdwXactResolverProcessInDoubtXacts(void);
+
+static TimestampTz last_resolution_time = -1;
+
+/* The list of currently holding FdwXact entries. */
+static List *heldFdwXactEntries = NIL;
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+FdwXactResolverDetach(void)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info and releas the holding
+ * FdwXactState entries.
+ */
+static void
+FdwXactResolverOnExit(int code, Datum arg)
+{
+	ListCell *lc;
+
+	FdwXactResolverDetach();
+
+	/* Release the held foreign transaction entries */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach (lc, heldFdwXactEntries)
+	{
+		FdwXactState fdwxact = (FdwXactState) lfirst(lc);
+
+		if (fdwxact->valid && fdwxact->locking_backend == MyBackendId)
+			fdwxact->locking_backend = InvalidBackendId;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+FdwXactResolverAttach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactResolverCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(FdwXactResolverOnExit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+
+	/* Attach to a slot */
+	FdwXactResolverAttach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" has started",
+					get_database_name(MyFdwXactResolver->dbid))));
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactResolverLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactResolverLoop(void)
+{
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve in-doubt transactions if any  */
+		FdwXactResolverProcessInDoubtXacts();
+
+		now = GetCurrentTimestamp();
+		FdwXactResolverCheckTimeout(now);
+		sleep_time = FdwXactResolverComputeSleepTime(now);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactResolverCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout",
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	FdwXactResolverDetach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out.
+ */
+static long
+FdwXactResolverComputeSleepTime(TimestampTz now)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		sleeptime = TimestampDifferenceMilliseconds(now, timeout);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Process in-doubt foreign transactions.
+ */
+static void
+FdwXactResolverProcessInDoubtXacts(void)
+{
+	ListCell *lc;
+
+	Assert(heldFdwXactEntries == NIL);
+
+	/* Hold all in-doubt foreign transactions */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->data.xid))
+		{
+			fdwxact->locking_backend = MyBackendId;
+			heldFdwXactEntries = lappend(heldFdwXactEntries, fdwxact);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+
+	foreach (lc, heldFdwXactEntries)
+	{
+		FdwXactState fdwxact = (FdwXactState) lfirst(lc);
+
+		/*
+		 * Resolve one foreign transaction. ResolveOneFdwXact() releases and
+		 * removes FdwXactState entry after resolution.
+		 */
+		StartTransactionCommand();
+		ResolveOneFdwXact(fdwxact);
+		CommitTransactionCommand();
+	}
+
+	if (list_length(heldFdwXactEntries) > 0)
+		last_resolution_time = GetCurrentTimestamp();
+
+	list_free(heldFdwXactEntries);
+	heldFdwXactEntries = NIL;
+}
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 46f3d08249..5b7090377d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,8 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -846,6 +848,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -2270,6 +2300,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, true);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExists(xid, InvalidOid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
@@ -2342,6 +2379,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * in the procarray and continue to hold locks.
 	 */
 	SyncRepWaitForLSN(recptr, false);
+
+	/*
+	 * If the prepared transaciton was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	if (FdwXactExists(xid, InvalidOid))
+		FdwXactLaunchOrWakeupResolver();
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 791b4243f0..bfe5e11245 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2127,7 +2127,7 @@ CommitTransaction(void)
 					  : XACT_EVENT_PRE_COMMIT);
 
 	/* Call foreign transaction callbacks at pre-commit phase, if any */
-	AtEOXact_FdwXact(true, is_parallel_worker);
+	PreCommit_FdwXact(is_parallel_worker);
 
 	/* If we might have parallel workers, clean them up now. */
 	if (IsInParallelMode())
@@ -2286,6 +2286,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true, is_parallel_worker);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2559,6 +2560,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true, false);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c1d4415a43..a923d0c585 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4645,6 +4646,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6428,6 +6430,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -6978,14 +6983,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7187,7 +7193,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7700,11 +7709,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Consider whether we need to assign a new timeline ID.
@@ -8028,8 +8039,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9381,6 +9396,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9917,6 +9933,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9936,6 +9953,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9954,6 +9972,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -10160,6 +10179,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10363,6 +10383,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 259cde3397..ec643bbdc6 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1470,6 +1470,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_USER_MAPPING:
+			RemoveUserMappingById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1485,7 +1489,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
 		case OCLASS_FOREIGN_SERVER:
-		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
 		case OCLASS_PUBLICATION:
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5c84d758bb..8a5677d6f0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -402,6 +402,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index eb7103fd3b..a56f01f170 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,7 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1060,7 +1061,6 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
-
 /*
  * Common routine to check permission for user-mapping-related DDL
  * commands.  We allow server owners to operate on any mapping, and
@@ -1307,6 +1307,37 @@ AlterUserMapping(AlterUserMappingStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop the given user mapping
+ */
+void
+RemoveUserMappingById(Oid umid)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(UserMappingRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for user mapping %u", umid);
+
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (FdwXactExists(InvalidTransactionId, umid))
+		ereport(ERROR,
+				(errmsg("user mapping %u has unresolved prepared transaction",
+						umid)));
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Drop user mapping
@@ -1374,6 +1405,7 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+
 	/*
 	 * Do the deletion
 	 */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index f8eb4fa215..6ce76b2aec 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 11fc1b7863..b7844c7ad2 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b05db5a473..2535a8bec3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -922,6 +923,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -987,12 +991,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 70670169ac..c23a024d98 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -179,6 +179,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..47179c37a4 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -150,6 +152,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactLauncherShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -269,6 +273,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactLauncherShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 5ff8cab394..403c2e3126 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1718,6 +1724,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1865,6 +1872,15 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->shared_oldest_nonremovable, h->fdwxact_unresolved_xmin);
+	h->data_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
+
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1924,6 +1940,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3835,6 +3854,21 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..a297c746cd 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,5 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2d6d145ecc..bb114af784 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3158,6 +3160,18 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 89b5b8b7b9..7c5d9817f5 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -726,6 +726,21 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_LOGICAL_SUBXACT_WRITE:
 			event_name = "LogicalSubxactWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
 
 			/* no default case, so that compiler will warn */
 	}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0a180341c2..853ffb8f97 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -769,6 +770,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS_PREVIOUS */
 	gettext_noop("Version and Platform Compatibility / Previous PostgreSQL Versions"),
 	/* COMPAT_OPTIONS_CLIENT */
@@ -2523,6 +2528,49 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index efde01ee56..d5abe7d4a7 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -132,6 +132,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -741,6 +743,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 152d21e88b..735e4084b3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -207,6 +207,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f911f98d94..49d47c2ee7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -296,6 +296,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..dd70a0f8a2 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 1d4a285c75..05fc1beb2e 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -1,7 +1,7 @@
 /*
  * fdwxact.h
  *
- * PostgreSQL global transaction manager
+ * PostgreSQL foreign transaction manager definitions
  *
  * Portions Copyright (c) 2021, PostgreSQL Global Development Group
  *
@@ -11,13 +11,83 @@
 #define FDWXACT_H
 
 #include "access/xact.h"
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 #define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactStateData *FdwXactState;
+typedef struct FdwXactStateData
+{
+	FdwXactState		fdwxact_free_next;	/* Next free FdwXactState entry */
+
+	/* Information relevant with foreign transaction */
+	FdwXactStateOnDiskData data;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXactState. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXactState. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+
+	char		identifier[FDWXACT_ID_MAX_LEN]; /* prepared transaction
+												 * identifier */
+} FdwXactStateData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXactState entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactStateData structs */
+	FdwXactState	free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int	num_xacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXactState	xacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactInfo
 {
@@ -25,10 +95,29 @@ typedef struct FdwXactInfo
 	UserMapping		*usermapping;
 
 	int	flags;			/* OR of FDWXACT_FLAG_xx flags */
+	char   *identifier;
 } FdwXactInfo;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+
 /* Function declarations */
+extern void PreCommit_FdwXact(bool is_parallel_worker);
 extern void AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker);
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtPrepare_FdwXact(void);
+extern bool FdwXactExists(TransactionId xid, Oid umid);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void ResolveOneFdwXact(FdwXactState fdwxact);
+extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
+								int len);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..8eab24a406
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void FdwXactRequestToLaunchResolver(void);
+extern void FdwXactLaunchOrWakeupResolver(void);
+extern Size FdwXactLauncherShmemSize(void);
+extern void FdwXactLauncherShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..9301ada5bb
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..a1a10b71b2
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId xid;
+	Oid		dbid;
+	Oid		umid;
+	Oid		serverid;
+	Oid		owner;
+	char	identifier[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactStateOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid		umid;
+	bool	force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..42f17120b0
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;			/* database oid */
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactResolverCtlData struct for the whole database cluster */
+typedef struct FdwXactResolverCtlData
+{
+	/* Foreign transaction resolution queue. Protected by FdwXactLock */
+	SHM_QUEUE	fdwxact_queue;
+
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactResolverCtlData;
+#define SizeOfFdwXactResolverCtlData \
+	(offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactResolverCtlData *FdwXactResolverCtl;
+extern FdwXactResolver *MyFdwXactResolver;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index f582cf535f..5ab1f57212 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 91786da784..3d35f89ae0 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 26a743b6b6..d6dfb98927 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..5673ec7299 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 26c3fc0f6b..bbd00832f5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6123,6 +6123,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '100', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{xid,umid,owner,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid',
+  proargnames => '{xid,umid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid',
+  proargnames => '{xid,umid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6247,6 +6265,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a foreign transaction resolver process running on the given database',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 6bce4d76fe..9d6f68e1b6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -128,6 +128,7 @@ extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
+extern void RemoveUserMappingById(Oid umid);
 extern void CreateForeignTable(CreateForeignTableStmt *stmt, Oid relid);
 extern void ImportForeignSchema(ImportForeignSchemaStmt *stmt);
 extern Datum transformGenericOptions(Oid catalogId,
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c4c5cc6384..aa93ddc7ae 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -192,6 +192,7 @@ typedef void (*ForeignAsyncConfigureWait_function) (AsyncRequest *areq);
 
 typedef void (*ForeignAsyncNotify_function) (AsyncRequest *areq);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
 
@@ -287,6 +288,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index b01fa52139..300a4cf5b6 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -93,5 +93,6 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 6b40f1eeb8..35802eac86 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -89,6 +89,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
 	ERROR_HANDLING_OPTIONS,
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 47accc5ffe..fcaaf00a80 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -224,7 +224,12 @@ typedef enum
 	WAIT_EVENT_LOGICAL_CHANGES_READ,
 	WAIT_EVENT_LOGICAL_CHANGES_WRITE,
 	WAIT_EVENT_LOGICAL_SUBXACT_READ,
-	WAIT_EVENT_LOGICAL_SUBXACT_WRITE
+	WAIT_EVENT_LOGICAL_SUBXACT_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN
 } WaitEventIO;
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e5ab11275d..8562e33992 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,13 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.umid,
+    f.owner,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, umid, owner, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.24.3 (Apple Git-128)

v36-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v36-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From 9c3d61b7b9d9d6569e2504c95ebfd69e72d5b80d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v36 1/9] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/Makefile  |   1 +
 src/backend/access/transam/fdwxact.c | 222 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |   8 +
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  34 ++++
 src/include/foreign/fdwapi.h         |  13 ++
 6 files changed, 282 insertions(+)
 create mode 100644 src/backend/access/transam/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 595e02de72..b05a88549d 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	clog.o \
 	commit_ts.o \
+	fdwxact.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
new file mode 100644
index 0000000000..3a4118caec
--- /dev/null
+++ b/src/backend/access/transam/fdwxact.c
@@ -0,0 +1,222 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by user mapping OID.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the foreign
+ * tranasctions.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/pg_user_mapping.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+/* Initial size of the hash table */
+#define FDWXACT_HASH_SIZE	64
+
+/* Check the FdwXactEntry supports commit (and rollback) callbacks */
+#define ServerSupportTransactionCallback(fdwent) \
+	(((FdwXactEntry *)(fdwent))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.
+ *
+ * Participants are identified by user mapping OID, rather than pair of
+ * user OID and server OID. See README.fdwxact for the discussion.
+ */
+typedef struct FdwXactEntry
+{
+	/* user mapping OID, hash key (must be first) */
+	Oid	umid;
+
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactEntry;
+
+/*
+ * Foreign transactions involved in the current transaction.  A member of
+ * participants must support both commit and rollback APIs
+ * (ServerSupportTransactionCallback() is true).
+ */
+static HTAB *FdwXactParticipants = NULL;
+
+/* Check the current transaction has at least one fdwxact participant */
+#define HasFdwXactParticipant() \
+	(FdwXactParticipants != NULL && \
+	 hash_get_num_entries(FdwXactParticipants) > 0)
+
+static void EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit,
+							bool is_parallel_worker);
+static void RemoveFdwXactEntry(Oid umid);
+
+/*
+ * Register the given foreign transaction identified by the given user
+ * mapping OID as a participant of the transaction.
+ */
+void
+FdwXactRegisterXact(UserMapping *usermapping)
+{
+	FdwXactEntry	*fdwent;
+	FdwRoutine	*routine;
+	Oid	umid;
+	MemoryContext old_ctx;
+	bool	found;
+
+	Assert(IsTransactionState());
+
+	if (FdwXactParticipants == NULL)
+	{
+		HASHCTL	ctl;
+
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(FdwXactEntry);
+
+		FdwXactParticipants = hash_create("fdw xact participants",
+										  FDWXACT_HASH_SIZE,
+										  &ctl, HASH_ELEM | HASH_BLOBS);
+	}
+
+	umid = usermapping->umid;
+	fdwent = hash_search(FdwXactParticipants, (void *) &umid, HASH_ENTER, &found);
+
+	if (found)
+		return;
+
+	/*
+	 * The participant information needs to live until the end of the transaction
+	 * where syscache is not available, so we save them in TopTransactionContext.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdwent->usermapping = GetUserMapping(usermapping->userid, usermapping->serverid);
+	fdwent->server = GetForeignServer(usermapping->serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	routine = GetFdwRoutineByServerId(usermapping->serverid);
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the foreign transaction from FdwXactParticipants */
+void
+FdwXactUnregisterXact(UserMapping *usermapping)
+{
+	Assert(IsTransactionState());
+	RemoveFdwXactEntry(usermapping->umid);
+}
+
+/*
+ * Remove an FdwXactEntry identified by the given user mapping id from the
+ * hash table.
+ */
+static void
+RemoveFdwXactEntry(Oid umid)
+{
+	(void) hash_search(FdwXactParticipants, (void *) &umid, HASH_REMOVE, NULL);
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker)
+{
+	FdwXactEntry *fdwent;
+	HASH_SEQ_STATUS scan;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (!HasFdwXactParticipant())
+		return;
+
+	hash_seq_init(&scan, FdwXactParticipants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		Assert(ServerSupportTransactionCallback(fdwent));
+
+		/* Commit or rollback foreign transaction */
+		EndFdwXactEntry(fdwent, isCommit, is_parallel_worker);
+
+		/*
+		 * Remove the entry so that we don't recursively process this foreign
+		 * transaction.
+		 */
+		RemoveFdwXactEntry(fdwent->umid);
+	}
+
+	Assert(!HasFdwXactParticipant());
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit, bool is_parallel_worker)
+{
+	FdwXactInfo finfo;
+
+	Assert(ServerSupportTransactionCallback(fdwent));
+
+	finfo.server = fdwent->server;
+	finfo.usermapping = fdwent->usermapping;
+	finfo.flags = FDWXACT_FLAG_ONEPHASE |
+		((is_parallel_worker) ? FDWXACT_FLAG_PARALLEL_WORKER : 0);
+
+	if (isCommit)
+	{
+		fdwent->commit_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully committed the foreign transaction for user mapping %u",
+			 fdwent->umid);
+	}
+	else
+	{
+		fdwent->rollback_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for user mapping %u",
+			 fdwent->umid);
+	}
+}
+
+/*
+ * This function is called at PREPARE TRANSACTION.  Since we don't support
+ * preparing foreign transactions yet, raise an error if the local transaction
+ * has any foreign transaction.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	if (HasFdwXactParticipant())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..791b4243f0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2125,6 +2126,9 @@ CommitTransaction(void)
 	CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT
 					  : XACT_EVENT_PRE_COMMIT);
 
+	/* Call foreign transaction callbacks at pre-commit phase, if any */
+	AtEOXact_FdwXact(true, is_parallel_worker);
+
 	/* If we might have parallel workers, clean them up now. */
 	if (IsInParallelMode())
 		AtEOXact_Parallel(true);
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Process foreign trasactions */
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2705,6 +2712,7 @@ AbortTransaction(void)
 	AtAbort_Notify();
 	AtEOXact_RelationMap(false, is_parallel_worker);
 	AtAbort_Twophase();
+	AtEOXact_FdwXact(false, is_parallel_worker);
 
 	/*
 	 * Advertise the fact that we aborted in pg_xact (assuming that we got as
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 5564dc3a1e..f8eb4fa215 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support both or nothing */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..1d4a285c75
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,34 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/xact.h"
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+#define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactInfo
+{
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	int	flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactInfo;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker);
+extern void AtPrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4f17becbb8..c4c5cc6384 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -191,6 +192,10 @@ typedef void (*ForeignAsyncConfigureWait_function) (AsyncRequest *areq);
 
 typedef void (*ForeignAsyncNotify_function) (AsyncRequest *areq);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
+typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -278,6 +283,10 @@ typedef struct FdwRoutine
 	ForeignAsyncRequest_function ForeignAsyncRequest;
 	ForeignAsyncConfigureWait_function ForeignAsyncConfigureWait;
 	ForeignAsyncNotify_function ForeignAsyncNotify;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -291,4 +300,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in fdwxact/fdwxact.c */
+extern void FdwXactRegisterXact(UserMapping *usermapping);
+extern void FdwXactUnregisterXact(UserMapping *usermapping);
+
 #endif							/* FDWAPI_H */
-- 
2.24.3 (Apple Git-128)

#231Zhihong Yu
zyu@yugabyte.com
In reply to: Masahiko Sawada (#230)
Re: Transactions involving multiple postgres foreign servers, take 2

On Mon, May 10, 2021 at 9:38 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Mon, May 3, 2021 at 11:11 PM Zhihong Yu <zyu@yugabyte.com> wrote:

On Mon, May 3, 2021 at 5:25 AM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <

sawada.mshk@gmail.com> wrote:

On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

With this commit, the foreign server modified within the

transaction marked as 'modified'.

transaction marked -> transaction is marked

Will fix.

+#define IsForeignTwophaseCommitRequested() \
+    (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think

the macro should be named: IsForeignTwophaseCommitRequired.

But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+       if (!ServerSupportTwophaseCommit(fdw_part))
+           have_no_twophase = true;
...
+   if (have_no_twophase)
+       ereport(ERROR,

It seems the error case should be reported within the loop. This

way, we don't need to iterate the other participant(s).

Accordingly, nserverswritten should be incremented for local

server prior to the loop. The condition in the loop would become if
(!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).

have_no_twophase is no longer needed.

Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.

Then the check would stay as what you have in the patch:

if (!ServerSupportTwophaseCommit(fdw_part))

When the non-2pc-capable server is encountered, we would report the

error in place (following the ServerSupportTwophaseCommit check) and come
out of the loop.

have_no_twophase can be dropped.

But if we processed only one non-2pc-capable server, we would raise an
error but should not in that case.

On second thought, I think we can track how many servers are modified
or not capable of 2PC during registration and unr-egistration. Then we
can consider both 2PC is required and there is non-2pc-capable server
is involved without looking through all participants. Thoughts?

That is something worth trying.

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Hi,
For v36-0005-Prepare-foreign-transactions-at-commit-time.patch :

With this commit, the foreign server modified within the transaction
marked as 'modified'.

The verb is missing from the above sentence. 'within the transaction marked
' -> within the transaction is marked

+ /* true if modified the data on the server */

modified the data -> data is modified

+   xid = GetTopTransactionIdIfAny();
...
+       if (!TransactionIdIsValid(xid))
+           xid = GetTopTransactionId();

I wonder when the above if condition is true, would
the GetTopTransactionId() get valid xid ? It seems the two func calls are
the same.

I like the way checkForeignTwophaseCommitRequired() is structured.

Cheers

#232Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#230)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Thanks for updating the patches!

I have other comments including trivial things.

a. about "foreign_transaction_resolver_timeout" parameter

Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
Is there any reason? Although the following is minor case, it may confuse some
users.

Example case is that

1. a client executes transaction with 2PC when the resolver is processing
FdwXactResolverProcessInDoubtXacts().

2. the resolution of 1st transaction must be waited until the other
transactions for 2pc are executed or timeout.

3. if the client check the 1st result value, it should wait until resolution
is finished for atomic visibility (although it depends on the way how to
realize atomic visibility.) The clients may be waited
foreign_transaction_resolver_timeout". Users may think it's stale.

Like this situation can be observed after testing with pgbench. Some
unresolved transaction remains after benchmarking.

I assume that this default value refers to wal_sender, archiver, and so on.
But, I think this parameter is more like "commit_delay". If so, 60 seconds
seems to be big value.

b. about performance bottleneck (just share my simple benchmark results)

The resolver process can be performance bottleneck easily although I think
some users want this feature even if the performance is not so good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an entry in each
partitions.
* local connection only. If NW latency became higher, the performance became
worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only 10%
performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

* with foreign_twophase_commit = disabled
-> 122TPS in my environments.

c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch

* typo: s/tranasction/transaction/

* Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if
(IsInParallelMode())" because make them in the same order as CommitTransaction()?

* functions name of fdwxact.c

Although this depends on my feeling, xact means transaction. If this feeling
same as you, the function names of FdwXactRegisterXact and so on are odd to
me. FdwXactRegisterEntry or FdwXactRegisterParticipant is better?

* Are the following better?

- s/to register the foreign transaction by/to register the foreign transaction
participant by/

- s/The registered foreign transactions/The registered participants/

- s/given foreign transaction/given foreign transaction participant/

- s/Foreign transactions involved in the current transaction/Foreign
transaction participants involved in the current transaction/

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION

#233Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiro Ikeda (#232)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Thanks for updating the patches!

I have other comments including trivial things.

a. about "foreign_transaction_resolver_timeout" parameter

Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
Is there any reason? Although the following is minor case, it may confuse some
users.

Example case is that

1. a client executes transaction with 2PC when the resolver is processing
FdwXactResolverProcessInDoubtXacts().

2. the resolution of 1st transaction must be waited until the other
transactions for 2pc are executed or timeout.

3. if the client check the 1st result value, it should wait until resolution
is finished for atomic visibility (although it depends on the way how to
realize atomic visibility.) The clients may be waited
foreign_transaction_resolver_timeout". Users may think it's stale.

Like this situation can be observed after testing with pgbench. Some
unresolved transaction remains after benchmarking.

I assume that this default value refers to wal_sender, archiver, and so on.
But, I think this parameter is more like "commit_delay". If so, 60 seconds
seems to be big value.

IIUC this situation seems like the foreign transaction resolution is
bottle-neck and doesn’t catch up to incoming resolution requests. But
how foreignt_transaction_resolver_timeout relates to this situation?
foreign_transaction_resolver_timeout controls when to terminate the
resolver process that doesn't have any foreign transactions to
resolve. So if we set it several milliseconds, resolver processes are
terminated immediately after each resolution, imposing the cost of
launching resolver processes on the next resolution.

b. about performance bottleneck (just share my simple benchmark results)

The resolver process can be performance bottleneck easily although I think
some users want this feature even if the performance is not so good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an entry in each
partitions.
* local connection only. If NW latency became higher, the performance became
worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only 10%
performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

To speed up the foreign transaction resolution, some ideas have been
discussed. As another idea, how about launching resolvers for each
foreign server? That way, we resolve foreign transactions on each
foreign server in parallel. If foreign transactions are concentrated
on the particular server, we can have multiple resolvers for the one
foreign server. It doesn’t change the fact that all foreign
transaction resolutions are processed by resolver processes.

Apart from that, we also might want to improve foreign transaction
management so that transaction doesn’t end up with an error if the
foreign transaction resolution doesn’t catch up with incoming
transactions that require 2PC. Maybe we can evict and serialize a
state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a
future improvement.

* with foreign_twophase_commit = disabled
-> 122TPS in my environments.

How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?

c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch

* typo: s/tranasction/transaction/

* Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if
(IsInParallelMode())" because make them in the same order as CommitTransaction()?

I'd prefer to move AtEOXact_FdwXact() in CommitTransaction after "if
(IsInParallelMode())" since other pre-commit works are done after
cleaning parallel contexts. What do you think?

* functions name of fdwxact.c

Although this depends on my feeling, xact means transaction. If this feeling
same as you, the function names of FdwXactRegisterXact and so on are odd to
me. FdwXactRegisterEntry or FdwXactRegisterParticipant is better?

FdwXactRegisterEntry sounds good to me. Thanks.

* Are the following better?

- s/to register the foreign transaction by/to register the foreign transaction
participant by/

- s/The registered foreign transactions/The registered participants/

- s/given foreign transaction/given foreign transaction participant/

- s/Foreign transactions involved in the current transaction/Foreign
transaction participants involved in the current transaction/

Agreed with the above suggestions.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#234Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#233)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/05/21 10:39, Masahiko Sawada wrote:

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Thanks for updating the patches!

I have other comments including trivial things.

a. about "foreign_transaction_resolver_timeout" parameter

Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
Is there any reason? Although the following is minor case, it may confuse some
users.

Example case is that

1. a client executes transaction with 2PC when the resolver is processing
FdwXactResolverProcessInDoubtXacts().

2. the resolution of 1st transaction must be waited until the other
transactions for 2pc are executed or timeout.

3. if the client check the 1st result value, it should wait until resolution
is finished for atomic visibility (although it depends on the way how to
realize atomic visibility.) The clients may be waited
foreign_transaction_resolver_timeout". Users may think it's stale.

Like this situation can be observed after testing with pgbench. Some
unresolved transaction remains after benchmarking.

I assume that this default value refers to wal_sender, archiver, and so on.
But, I think this parameter is more like "commit_delay". If so, 60 seconds
seems to be big value.

IIUC this situation seems like the foreign transaction resolution is
bottle-neck and doesn’t catch up to incoming resolution requests. But
how foreignt_transaction_resolver_timeout relates to this situation?
foreign_transaction_resolver_timeout controls when to terminate the
resolver process that doesn't have any foreign transactions to
resolve. So if we set it several milliseconds, resolver processes are
terminated immediately after each resolution, imposing the cost of
launching resolver processes on the next resolution.

Thanks for your comments!

No, this situation is not related to the foreign transaction resolution is
bottle-neck or not. This issue may happen when the workload has very few
foreign transactions.

If new foreign transaction comes while the transaction resolver is processing
resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
waits until starting next transaction resolution. If next foreign transaction
doesn't come, the foreign transaction must wait starting resolution until
timeout. I mentioned this situation.

Thanks for letting me know the side effect if setting resolution timeout to
several milliseconds. I agree. But, why termination is needed? Is there a
possibility to stale like walsender?

b. about performance bottleneck (just share my simple benchmark results)

The resolver process can be performance bottleneck easily although I think
some users want this feature even if the performance is not so good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an entry in each
partitions.
* local connection only. If NW latency became higher, the performance became
worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only 10%
performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough because
8clients x 2partitions = 16, though... But, it's difficult how to know the
stable values.

To speed up the foreign transaction resolution, some ideas have been
discussed. As another idea, how about launching resolvers for each
foreign server? That way, we resolve foreign transactions on each
foreign server in parallel. If foreign transactions are concentrated
on the particular server, we can have multiple resolvers for the one
foreign server. It doesn’t change the fact that all foreign
transaction resolutions are processed by resolver processes.

Awesome! There seems to be another pros that even if a foreign server is
temporarily busy or stopped due to fail over, other foreign server's
transactions can be resolved.

Apart from that, we also might want to improve foreign transaction
management so that transaction doesn’t end up with an error if the
foreign transaction resolution doesn’t catch up with incoming
transactions that require 2PC. Maybe we can evict and serialize a
state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a
future improvement.

Oh, great! I didn't come up with the idea.

Although I thought the feature makes difficult to know the foreign transaction
is resolved stably, DBAs can check "pg_foreign_xacts" view now and it's enough
to output the situation of foreign transactions are spilled to the log.

* with foreign_twophase_commit = disabled
-> 122TPS in my environments.

How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?

OK, I'll test.

c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch

* typo: s/tranasction/transaction/

* Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if
(IsInParallelMode())" because make them in the same order as CommitTransaction()?

I'd prefer to move AtEOXact_FdwXact() in CommitTransaction after "if
(IsInParallelMode())" since other pre-commit works are done after
cleaning parallel contexts. What do you think?

OK, I agree.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#235Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiro Ikeda (#234)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

On 2021/05/21 10:39, Masahiko Sawada wrote:

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Thanks for updating the patches!

I have other comments including trivial things.

a. about "foreign_transaction_resolver_timeout" parameter

Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
Is there any reason? Although the following is minor case, it may confuse some
users.

Example case is that

1. a client executes transaction with 2PC when the resolver is processing
FdwXactResolverProcessInDoubtXacts().

2. the resolution of 1st transaction must be waited until the other
transactions for 2pc are executed or timeout.

3. if the client check the 1st result value, it should wait until resolution
is finished for atomic visibility (although it depends on the way how to
realize atomic visibility.) The clients may be waited
foreign_transaction_resolver_timeout". Users may think it's stale.

Like this situation can be observed after testing with pgbench. Some
unresolved transaction remains after benchmarking.

I assume that this default value refers to wal_sender, archiver, and so on.
But, I think this parameter is more like "commit_delay". If so, 60 seconds
seems to be big value.

IIUC this situation seems like the foreign transaction resolution is
bottle-neck and doesn’t catch up to incoming resolution requests. But
how foreignt_transaction_resolver_timeout relates to this situation?
foreign_transaction_resolver_timeout controls when to terminate the
resolver process that doesn't have any foreign transactions to
resolve. So if we set it several milliseconds, resolver processes are
terminated immediately after each resolution, imposing the cost of
launching resolver processes on the next resolution.

Thanks for your comments!

No, this situation is not related to the foreign transaction resolution is
bottle-neck or not. This issue may happen when the workload has very few
foreign transactions.

If new foreign transaction comes while the transaction resolver is processing
resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
waits until starting next transaction resolution. If next foreign transaction
doesn't come, the foreign transaction must wait starting resolution until
timeout. I mentioned this situation.

Thanks for your explanation. I think that in this case we should set
the latch of the resolver after preparing all foreign transactions so
that the resolver process those transactions without sleep.

Thanks for letting me know the side effect if setting resolution timeout to
several milliseconds. I agree. But, why termination is needed? Is there a
possibility to stale like walsender?

The purpose of this timeout is to terminate resolvers that are idle
for a long time. The resolver processes don't necessarily need to keep
running all the time for every database. On the other hand, launching
a resolver process per commit would be a high cost. So we have
resolver processes keep running at least for
foreign_transaction_resolver_timeout.

b. about performance bottleneck (just share my simple benchmark results)

The resolver process can be performance bottleneck easily although I think
some users want this feature even if the performance is not so good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an entry in each
partitions.
* local connection only. If NW latency became higher, the performance became
worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only 10%
performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough because
8clients x 2partitions = 16, though... But, it's difficult how to know the
stable values.

During resolving one distributed transaction, the resolver needs both
one round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be
resolved, the resolver process can be easily bottle-neck given there
are 8 clients.

If foreign transaction resolution was resolved synchronously, 16 would suffice.

To speed up the foreign transaction resolution, some ideas have been
discussed. As another idea, how about launching resolvers for each
foreign server? That way, we resolve foreign transactions on each
foreign server in parallel. If foreign transactions are concentrated
on the particular server, we can have multiple resolvers for the one
foreign server. It doesn’t change the fact that all foreign
transaction resolutions are processed by resolver processes.

Awesome! There seems to be another pros that even if a foreign server is
temporarily busy or stopped due to fail over, other foreign server's
transactions can be resolved.

Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign
transactions in arrival order at least within a foreign server.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#236Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#235)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/05/21 13:45, Masahiko Sawada wrote:

On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

On 2021/05/21 10:39, Masahiko Sawada wrote:

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Thanks for updating the patches!

I have other comments including trivial things.

a. about "foreign_transaction_resolver_timeout" parameter

Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
Is there any reason? Although the following is minor case, it may confuse some
users.

Example case is that

1. a client executes transaction with 2PC when the resolver is processing
FdwXactResolverProcessInDoubtXacts().

2. the resolution of 1st transaction must be waited until the other
transactions for 2pc are executed or timeout.

3. if the client check the 1st result value, it should wait until resolution
is finished for atomic visibility (although it depends on the way how to
realize atomic visibility.) The clients may be waited
foreign_transaction_resolver_timeout". Users may think it's stale.

Like this situation can be observed after testing with pgbench. Some
unresolved transaction remains after benchmarking.

I assume that this default value refers to wal_sender, archiver, and so on.
But, I think this parameter is more like "commit_delay". If so, 60 seconds
seems to be big value.

IIUC this situation seems like the foreign transaction resolution is
bottle-neck and doesn’t catch up to incoming resolution requests. But
how foreignt_transaction_resolver_timeout relates to this situation?
foreign_transaction_resolver_timeout controls when to terminate the
resolver process that doesn't have any foreign transactions to
resolve. So if we set it several milliseconds, resolver processes are
terminated immediately after each resolution, imposing the cost of
launching resolver processes on the next resolution.

Thanks for your comments!

No, this situation is not related to the foreign transaction resolution is
bottle-neck or not. This issue may happen when the workload has very few
foreign transactions.

If new foreign transaction comes while the transaction resolver is processing
resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
waits until starting next transaction resolution. If next foreign transaction
doesn't come, the foreign transaction must wait starting resolution until
timeout. I mentioned this situation.

Thanks for your explanation. I think that in this case we should set
the latch of the resolver after preparing all foreign transactions so
that the resolver process those transactions without sleep.

Yes, your idea is much better. Thanks!

Thanks for letting me know the side effect if setting resolution timeout to
several milliseconds. I agree. But, why termination is needed? Is there a
possibility to stale like walsender?

The purpose of this timeout is to terminate resolvers that are idle
for a long time. The resolver processes don't necessarily need to keep
running all the time for every database. On the other hand, launching
a resolver process per commit would be a high cost. So we have
resolver processes keep running at least for
foreign_transaction_resolver_timeout.

Understood. I think it's reasonable.

b. about performance bottleneck (just share my simple benchmark results)

The resolver process can be performance bottleneck easily although I think
some users want this feature even if the performance is not so good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an entry in each
partitions.
* local connection only. If NW latency became higher, the performance became
worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only 10%
performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough because
8clients x 2partitions = 16, though... But, it's difficult how to know the
stable values.

During resolving one distributed transaction, the resolver needs both
one round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be
resolved, the resolver process can be easily bottle-neck given there
are 8 clients.

If foreign transaction resolution was resolved synchronously, 16 would suffice.

OK, thanks.

To speed up the foreign transaction resolution, some ideas have been
discussed. As another idea, how about launching resolvers for each
foreign server? That way, we resolve foreign transactions on each
foreign server in parallel. If foreign transactions are concentrated
on the particular server, we can have multiple resolvers for the one
foreign server. It doesn’t change the fact that all foreign
transaction resolutions are processed by resolver processes.

Awesome! There seems to be another pros that even if a foreign server is
temporarily busy or stopped due to fail over, other foreign server's
transactions can be resolved.

Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server.

I agree it's better.

(Although this is my interest...)
Is it necessary? Although this idea seems to be for atomic visibility,
2PC can't realize that as you know. So, I wondered that.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#237Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiro Ikeda (#236)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/21 13:45, Masahiko Sawada wrote:

Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server.

I agree it's better.

(Although this is my interest...)
Is it necessary? Although this idea seems to be for atomic visibility,
2PC can't realize that as you know. So, I wondered that.

I think it's for fairness. If a foreign transaction arrived earlier
gets put off so often for other foreign transactions arrived later due
to its index in FdwXactCtl->xacts, it’s not understandable for users
and not fair. I think it’s better to handle foreign transactions in
FIFO manner (although this problem exists even in the current code).

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#238Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#237)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/05/25 21:59, Masahiko Sawada wrote:

On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/21 13:45, Masahiko Sawada wrote:

Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server.

I agree it's better.

(Although this is my interest...)
Is it necessary? Although this idea seems to be for atomic visibility,
2PC can't realize that as you know. So, I wondered that.

I think it's for fairness. If a foreign transaction arrived earlier
gets put off so often for other foreign transactions arrived later due
to its index in FdwXactCtl->xacts, it’s not understandable for users
and not fair. I think it’s better to handle foreign transactions in
FIFO manner (although this problem exists even in the current code).

OK, thanks.

On 2021/05/21 12:45, Masahiro Ikeda wrote:

On 2021/05/21 10:39, Masahiko Sawada wrote:

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com>

wrote:

How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?

OK, I'll test.

The test results are followings. But, I couldn't confirm the performance
improvements of 2PC patches though I may need to be changed the test condition.

[condition]
* 1 coordinator and 3 foreign servers
* There are two custom scripts which access different two foreign servers per
transaction

``` fxact_select.pgbench
BEGIN;
SELECT * FROM part:p1 WHERE id = :id;
SELECT * FROM part:p2 WHERE id = :id;
COMMIT;
```

``` fxact_update.pgbench
BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

[results]

I have tested three times.
Performance difference seems to be within the range of errors.

# 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
- fxact_update.pgbench
72.3, 74.9, 77.5 TPS => avg 74.9 TPS
110.5, 106.8, 103.2 ms => avg 106.8 ms

- fxact_select.pgbench
1767.6, 1737.1, 1717.4 TPS => avg 1740.7 TPS
4.5, 4.6, 4.7 ms => avg 4.6ms

# 6d0eb38557 without 2pc patches
- fxact_update.pgbench
76.5, 70.6, 69.5 TPS => avg 72.2 TPS
104.534 + 113.244 + 115.097 => avg 111.0 ms

-fxact_select.pgbench
1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
4.2, 4.6, 4.6 ms=> 4.5 ms

# About the bottleneck of the resolver process

I investigated the performance bottleneck of the resolver process using perf.
The main bottleneck is the following functions.

1st. 42.8% routine->CommitForeignTransaction()
2nd. 31.5% remove_fdwxact()
3rd. 10.16% CommitTransaction()

1st and 3rd problems can be solved by parallelizing resolver processes per
remote servers. But, I wondered that the idea, which backends call also
"COMMIT/ABORT PREPARED" and the resolver process only takes changes of
resolving in-doubt foreign transactions, is better. In many cases, I think
that the number of connections is much greater than the number of remote
servers. If so, the parallelization is not enough.

So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
the extension. So, this idea is not bad.

Although resolving asynchronously has the performance benefit, we can't take
advantage because the resolver process can be bottleneck easily now.

2nd remove_fdwxact() syncs the WAL, which indicates the foreign transaction
entry is removed. Is it necessary to sync momentarily?

To remove syncing leads the time of recovery phase may be longer because some
fdxact entries need to "COMMIT/ABORT PREPARED" again. But I think the effect
is limited.

# About other trivial comments.

* Is it better to call pgstat_send_wal() in the resolver process?

* Is it better to specify that only one resolver process can be launched in on
database on the descrpition of "max_foreign_transaction_resolvers"?

* Is it intentional that removing and inserting new lines in foreigncmds.c?

* Is it better that "max_prepared_foreign_transactions=%d" is after
"max_prepared_xacts=%d" in xlogdesc.c?

* Is "fdwxact_queue" unnecessary now?

* Is the following " + sizeof(FdwXactResolver)" unnecessary?

#define SizeOfFdwXactResolverCtlData \
(offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver))

Although MultiXactStateData considered the backendIds start from 1 indexed,
the resolvers start from 0 indexed. Sorry, if my understanding is wrong.

* s/transaciton/transaction/

* s/foreign_xact_resolution_retry_interval since last
resolver/foreign_xact_resolution_retry_interval since last resolver was/

* Don't we need the debug log in the following in postgres.c like logical
launcher shutdown?

else if (IsFdwXactLauncher())
{
/*
* The foreign transaction launcher can be stopped at any time.
* Use exit status 1 so the background worker is restarted.
*/
proc_exit(1);
}

* Is pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS) not documented?

* Is it better from "when arrived a requested by backend process." to
"when a request by backend process is arrived."?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#239Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiro Ikeda (#238)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/25 21:59, Masahiko Sawada wrote:

On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/21 13:45, Masahiko Sawada wrote:

Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server.

I agree it's better.

(Although this is my interest...)
Is it necessary? Although this idea seems to be for atomic visibility,
2PC can't realize that as you know. So, I wondered that.

I think it's for fairness. If a foreign transaction arrived earlier
gets put off so often for other foreign transactions arrived later due
to its index in FdwXactCtl->xacts, it’s not understandable for users
and not fair. I think it’s better to handle foreign transactions in
FIFO manner (although this problem exists even in the current code).

OK, thanks.

On 2021/05/21 12:45, Masahiro Ikeda wrote:

On 2021/05/21 10:39, Masahiko Sawada wrote:

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com>

wrote:

How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?

OK, I'll test.

The test results are followings. But, I couldn't confirm the performance
improvements of 2PC patches though I may need to be changed the test condition.

[condition]
* 1 coordinator and 3 foreign servers
* There are two custom scripts which access different two foreign servers per
transaction

``` fxact_select.pgbench
BEGIN;
SELECT * FROM part:p1 WHERE id = :id;
SELECT * FROM part:p2 WHERE id = :id;
COMMIT;
```

``` fxact_update.pgbench
BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

[results]

I have tested three times.
Performance difference seems to be within the range of errors.

# 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
- fxact_update.pgbench
72.3, 74.9, 77.5 TPS => avg 74.9 TPS
110.5, 106.8, 103.2 ms => avg 106.8 ms

- fxact_select.pgbench
1767.6, 1737.1, 1717.4 TPS => avg 1740.7 TPS
4.5, 4.6, 4.7 ms => avg 4.6ms

# 6d0eb38557 without 2pc patches
- fxact_update.pgbench
76.5, 70.6, 69.5 TPS => avg 72.2 TPS
104.534 + 113.244 + 115.097 => avg 111.0 ms

-fxact_select.pgbench
1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
4.2, 4.6, 4.6 ms=> 4.5 ms

Thank you for testing!

I think the result shows that managing foreign transactions on the
core side would not be a problem in terms of performance.

# About the bottleneck of the resolver process

I investigated the performance bottleneck of the resolver process using perf.
The main bottleneck is the following functions.

1st. 42.8% routine->CommitForeignTransaction()
2nd. 31.5% remove_fdwxact()
3rd. 10.16% CommitTransaction()

1st and 3rd problems can be solved by parallelizing resolver processes per
remote servers. But, I wondered that the idea, which backends call also
"COMMIT/ABORT PREPARED" and the resolver process only takes changes of
resolving in-doubt foreign transactions, is better. In many cases, I think
that the number of connections is much greater than the number of remote
servers. If so, the parallelization is not enough.

So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
the extension. So, this idea is not bad.

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature. For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error. And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

The second problem is whether we can cancel committing foreign
transactions by pg_cancel_backend() (or pressing Ctl-c). If the
backend process commits prepared foreign transactions, it's FDW
developers' responsibility to write code that is interruptible. I’m
not sure it’s feasible for drivers for other databases.

Idea 3 is proposed to deal with those problems. By having separate
processes, resolver processes, committing prepared foreign
transactions, we and FDW developers don't need to worry about those
two problems.

However as Ikeda-san shared the performance results, idea 3 is likely
to have a performance problem since resolver processes can easily be
bottle-neck. Moreover, with the current patch, since we asynchronously
commit foreign prepared transactions, if many concurrent clients use
2PC, reaching max_foreign_prepared_transactions, transactions end up
with an error.

Through the long discussion on this thread, I've been thought we got a
consensus on idea 3 but sometimes ideas 1 and 2 are proposed again for
dealing with the performance problem. Idea 1 and 2 are also good and
attractive, but I think we need to deal with the two problems first if
we go with one of those ideas. To be honest, I'm really not sure it's
good if we make those things FDW developers responsibility.

As long as we commit foreign prepared transactions asynchronously and
there is max_foreign_prepared_transactions limit, it's possible that
committing those transactions could not keep up. Maybe the same is
true for a case where the client heavily uses 2PC and asynchronously
commits prepared transactions. If committing prepared transactions
doesn't keep up with preparing transactions, the system reaches
max_prepared_transactions.

With the current patch, we commit prepared foreign transactions
asynchronously. But maybe we need to compare the performance of ideas
1 (and 2) to idea 3 with synchronous foreign transaction resolution.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#240ikedamsh@oss.nttdata.com
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#239)
Re: Transactions involving multiple postgres foreign servers, take 2

2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール:

On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com <mailto:ikedamsh@oss.nttdata.com>> wrote:

On 2021/05/25 21:59, Masahiko Sawada wrote:

On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/05/21 13:45, Masahiko Sawada wrote:

Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server.

I agree it's better.

(Although this is my interest...)
Is it necessary? Although this idea seems to be for atomic visibility,
2PC can't realize that as you know. So, I wondered that.

I think it's for fairness. If a foreign transaction arrived earlier
gets put off so often for other foreign transactions arrived later due
to its index in FdwXactCtl->xacts, it’s not understandable for users
and not fair. I think it’s better to handle foreign transactions in
FIFO manner (although this problem exists even in the current code).

OK, thanks.

On 2021/05/21 12:45, Masahiro Ikeda wrote:

On 2021/05/21 10:39, Masahiko Sawada wrote:

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com>

wrote:

How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?

OK, I'll test.

The test results are followings. But, I couldn't confirm the performance
improvements of 2PC patches though I may need to be changed the test condition.

[condition]
* 1 coordinator and 3 foreign servers
* There are two custom scripts which access different two foreign servers per
transaction

``` fxact_select.pgbench
BEGIN;
SELECT * FROM part:p1 WHERE id = :id;
SELECT * FROM part:p2 WHERE id = :id;
COMMIT;
```

``` fxact_update.pgbench
BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

[results]

I have tested three times.
Performance difference seems to be within the range of errors.

# 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
- fxact_update.pgbench
72.3, 74.9, 77.5 TPS => avg 74.9 TPS
110.5, 106.8, 103.2 ms => avg 106.8 ms

- fxact_select.pgbench
1767.6, 1737.1, 1717.4 TPS => avg 1740.7 TPS
4.5, 4.6, 4.7 ms => avg 4.6ms

# 6d0eb38557 without 2pc patches
- fxact_update.pgbench
76.5, 70.6, 69.5 TPS => avg 72.2 TPS
104.534 + 113.244 + 115.097 => avg 111.0 ms

-fxact_select.pgbench
1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
4.2, 4.6, 4.6 ms=> 4.5 ms

Thank you for testing!

I think the result shows that managing foreign transactions on the
core side would not be a problem in terms of performance.

# About the bottleneck of the resolver process

I investigated the performance bottleneck of the resolver process using perf.
The main bottleneck is the following functions.

1st. 42.8% routine->CommitForeignTransaction()
2nd. 31.5% remove_fdwxact()
3rd. 10.16% CommitTransaction()

1st and 3rd problems can be solved by parallelizing resolver processes per
remote servers. But, I wondered that the idea, which backends call also
"COMMIT/ABORT PREPARED" and the resolver process only takes changes of
resolving in-doubt foreign transactions, is better. In many cases, I think
that the number of connections is much greater than the number of remote
servers. If so, the parallelization is not enough.

So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
the extension. So, this idea is not bad.

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

Thanks for sharing the summarize. I understood there are problems related to
FDW implementation.

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature. For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error. And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

Hmm… Sorry, I don’t have any good ideas now.

If anything, I’m on second side which users accept the confusion though
let users know a error happens before local commit is done or not is necessary
because if the former case, users will execute the same query again.

The second problem is whether we can cancel committing foreign
transactions by pg_cancel_backend() (or pressing Ctl-c). If the
backend process commits prepared foreign transactions, it's FDW
developers' responsibility to write code that is interruptible. I’m
not sure it’s feasible for drivers for other databases.

Sorry, my understanding is not clear.

After all prepares are done, the foreign transactions will be committed.
So, does this mean that FDW must leave the unresolved transaction to the transaction
resolver and show some messages like “Since the transaction is already committed,
the transaction will be resolved in background" ?

Idea 3 is proposed to deal with those problems. By having separate
processes, resolver processes, committing prepared foreign
transactions, we and FDW developers don't need to worry about those
two problems.

However as Ikeda-san shared the performance results, idea 3 is likely
to have a performance problem since resolver processes can easily be
bottle-neck. Moreover, with the current patch, since we asynchronously
commit foreign prepared transactions, if many concurrent clients use
2PC, reaching max_foreign_prepared_transactions, transactions end up
with an error.

Through the long discussion on this thread, I've been thought we got a
consensus on idea 3 but sometimes ideas 1 and 2 are proposed again for
dealing with the performance problem. Idea 1 and 2 are also good and
attractive, but I think we need to deal with the two problems first if
we go with one of those ideas. To be honest, I'm really not sure it's
good if we make those things FDW developers responsibility.

As long as we commit foreign prepared transactions asynchronously and
there is max_foreign_prepared_transactions limit, it's possible that
committing those transactions could not keep up. Maybe the same is
true for a case where the client heavily uses 2PC and asynchronously
commits prepared transactions. If committing prepared transactions
doesn't keep up with preparing transactions, the system reaches
max_prepared_transactions.

With the current patch, we commit prepared foreign transactions
asynchronously. But maybe we need to compare the performance of ideas
1 (and 2) to idea 3 with synchronous foreign transaction resolution.

OK, I understood the consensus is 3rd one. I agree it since I don’t have any solutions
For the problems related 1st and 2nd. If I find them, I’ll share you.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#241tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#239)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada.mshk@gmail.com>
1. the backend continues attempting to commit all prepared foreign

transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature.

Why does the user have to get an error? Once the local transaction has been prepared, which means all remote ones also have been prepared, the whole transaction is determined to commit. So, the user doesn't have to receive an error as long as the local node is alive.

For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error.

No, this is a matter of discipline to ensure consistency, just in case we really have to return an error to the user.

And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

How do non-2PC and 2PC cases differ in the rarity of the error?

The second problem is whether we can cancel committing foreign
transactions by pg_cancel_backend() (or pressing Ctl-c). If the
backend process commits prepared foreign transactions, it's FDW
developers' responsibility to write code that is interruptible. I’m
not sure it’s feasible for drivers for other databases.

That's true not only for prepare and commit but also for other queries. Why do we have to treat prepare and commit specially?

Through the long discussion on this thread, I've been thought we got a
consensus on idea 3 but sometimes ideas 1 and 2 are proposed again for

I don't remember seeing any consensus yet?

With the current patch, we commit prepared foreign transactions
asynchronously. But maybe we need to compare the performance of ideas
1 (and 2) to idea 3 with synchronous foreign transaction resolution.

+1

Regards
Takayuki Tsunakawa

#242Masahiko Sawada
sawada.mshk@gmail.com
In reply to: ikedamsh@oss.nttdata.com (#240)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com
<ikedamsh@oss.nttdata.com> wrote:

2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール:

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

Thanks for sharing the summarize. I understood there are problems related to
FDW implementation.

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature. For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error. And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

Hmm… Sorry, I don’t have any good ideas now.

If anything, I’m on second side which users accept the confusion though
let users know a error happens before local commit is done or not is necessary
because if the former case, users will execute the same query again.

Yeah, users will need to remember the XID of the last executed
transaction and check if it has been committed by pg_xact_status().

The second problem is whether we can cancel committing foreign
transactions by pg_cancel_backend() (or pressing Ctl-c). If the
backend process commits prepared foreign transactions, it's FDW
developers' responsibility to write code that is interruptible. I’m
not sure it’s feasible for drivers for other databases.

Sorry, my understanding is not clear.

After all prepares are done, the foreign transactions will be committed.
So, does this mean that FDW must leave the unresolved transaction to the transaction
resolver and show some messages like “Since the transaction is already committed,
the transaction will be resolved in background" ?

I think this would happen after the backend cancels COMMIT PREPARED.
To be able to cancel an in-progress query the backend needs to accept
the interruption and send the cancel request. postgres_fdw can do that
since libpq supports sending a query and waiting for the result but
I’m not sure about other drivers.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#243Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#241)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <sawada.mshk@gmail.com>
1. the backend continues attempting to commit all prepared foreign

transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature.

Why does the user have to get an error? Once the local transaction has been prepared, which means all remote ones also have been prepared, the whole transaction is determined to commit. So, the user doesn't have to receive an error as long as the local node is alive.

I think we should neither ignore the error thrown by FDW code nor
lower the error level (e.g., ERROR to WARNING).

And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

How do non-2PC and 2PC cases differ in the rarity of the error?

I think the main difference would be that in 2PC case there will be
network communications possibly with multiple servers after the local
commit.

The second problem is whether we can cancel committing foreign
transactions by pg_cancel_backend() (or pressing Ctl-c). If the
backend process commits prepared foreign transactions, it's FDW
developers' responsibility to write code that is interruptible. I’m
not sure it’s feasible for drivers for other databases.

That's true not only for prepare and commit but also for other queries. Why do we have to treat prepare and commit specially?

Good point. This would not be a blocker for ideas 1 and 2 but is a
side benefit of idea 3.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#244tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#243)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada.mshk@gmail.com>

On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Why does the user have to get an error? Once the local transaction has been

prepared, which means all remote ones also have been prepared, the whole
transaction is determined to commit. So, the user doesn't have to receive an
error as long as the local node is alive.

I think we should neither ignore the error thrown by FDW code nor
lower the error level (e.g., ERROR to WARNING).

Why? (Forgive me for asking relentlessly... by imagining me as a cute 7-year-old boy/girl asking "Why Dad?")

How do non-2PC and 2PC cases differ in the rarity of the error?

I think the main difference would be that in 2PC case there will be
network communications possibly with multiple servers after the local
commit.

Then, it's the same failure mode. That is, the same failure could occur for both cases. That doesn't require us to differentiate between them. Let's ignore this point from now on.

Regards
Takayuki Tsunakawa

#245Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#244)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 4, 2021 at 5:59 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <sawada.mshk@gmail.com>

On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Why does the user have to get an error? Once the local transaction has been

prepared, which means all remote ones also have been prepared, the whole
transaction is determined to commit. So, the user doesn't have to receive an
error as long as the local node is alive.

I think we should neither ignore the error thrown by FDW code nor
lower the error level (e.g., ERROR to WARNING).

Why? (Forgive me for asking relentlessly... by imagining me as a cute 7-year-old boy/girl asking "Why Dad?")

I think we should not reinterpret the severity of the error and lower
it. Especially, in this case, any kind of errors can be thrown. It
could be such a serious error that FDW developer wants to report to
the client. Do we lower even PANIC to a lower severity such as
WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
lowering ERROR (and FATAL) to WARNING, why do we regard only them as
non-error?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#246Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#242)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com
<ikedamsh@oss.nttdata.com> wrote:

2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール:

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

Thanks for sharing the summarize. I understood there are problems related to
FDW implementation.

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature. For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error. And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

Hmm… Sorry, I don’t have any good ideas now.

If anything, I’m on second side which users accept the confusion though
let users know a error happens before local commit is done or not is necessary
because if the former case, users will execute the same query again.

Yeah, users will need to remember the XID of the last executed
transaction and check if it has been committed by pg_xact_status().

As the second idea, can we send something like a hint along with the
error (or send a new type of error) that indicates the error happened
after the transaction commit so that the client can decide whether or
not to ignore the error? That way, we can deal with the confusion led
by an error raised after the local commit by the existing post-commit
cleanup routines (and post-commit xact callbacks) as well as by FDW’s
commit prepared routine.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#247ikedamsh@oss.nttdata.com
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#242)
Re: Transactions involving multiple postgres foreign servers, take 2

2021/06/04 17:16、Masahiko Sawada <sawada.mshk@gmail.com>のメール:

On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com
<ikedamsh@oss.nttdata.com> wrote:

2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール:

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

Thanks for sharing the summarize. I understood there are problems related to
FDW implementation.

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature. For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error. And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

Hmm… Sorry, I don’t have any good ideas now.

If anything, I’m on second side which users accept the confusion though
let users know a error happens before local commit is done or not is necessary
because if the former case, users will execute the same query again.

Yeah, users will need to remember the XID of the last executed
transaction and check if it has been committed by pg_xact_status().

The second problem is whether we can cancel committing foreign
transactions by pg_cancel_backend() (or pressing Ctl-c). If the
backend process commits prepared foreign transactions, it's FDW
developers' responsibility to write code that is interruptible. I’m
not sure it’s feasible for drivers for other databases.

Sorry, my understanding is not clear.

After all prepares are done, the foreign transactions will be committed.
So, does this mean that FDW must leave the unresolved transaction to the transaction
resolver and show some messages like “Since the transaction is already committed,
the transaction will be resolved in background" ?

I think this would happen after the backend cancels COMMIT PREPARED.
To be able to cancel an in-progress query the backend needs to accept
the interruption and send the cancel request. postgres_fdw can do that
since libpq supports sending a query and waiting for the result but
I’m not sure about other drivers.

Thanks, I understood that handling this issue is not scope of the 2PC feature
as Tsunakawa-san and you said,

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#248ikedamsh@oss.nttdata.com
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#246)
Re: Transactions involving multiple postgres foreign servers, take 2

2021/06/04 21:38、Masahiko Sawada <sawada.mshk@gmail.com>のメール:

On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com
<ikedamsh@oss.nttdata.com> wrote:

2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール:

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

Thanks for sharing the summarize. I understood there are problems related to
FDW implementation.

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature. For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error. And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.

Hmm… Sorry, I don’t have any good ideas now.

If anything, I’m on second side which users accept the confusion though
let users know a error happens before local commit is done or not is necessary
because if the former case, users will execute the same query again.

Yeah, users will need to remember the XID of the last executed
transaction and check if it has been committed by pg_xact_status().

As the second idea, can we send something like a hint along with the
error (or send a new type of error) that indicates the error happened
after the transaction commit so that the client can decide whether or
not to ignore the error? That way, we can deal with the confusion led
by an error raised after the local commit by the existing post-commit
cleanup routines (and post-commit xact callbacks) as well as by FDW’s
commit prepared routine.

I think your second idea is better because it’s easier for users to know what
error happens and there is nothing users should do. Since the focus of "hint”
is how to fix the problem, is it appropriate to use "context”?

FWIF, I took a fast look to elog.c and I found there is “error_context_stack”.
So, why don’t you add the context which shows like "the transaction fate is
decided to COMMIT (or ROLLBACK). So, even if error happens, the transaction
will be resolved in background” after the local commit?

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION

#249tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#245)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada.mshk@gmail.com>

I think we should not reinterpret the severity of the error and lower
it. Especially, in this case, any kind of errors can be thrown. It
could be such a serious error that FDW developer wants to report to
the client. Do we lower even PANIC to a lower severity such as
WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
lowering ERROR (and FATAL) to WARNING, why do we regard only them as
non-error?

Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit?

FYI, the tx_commit() in the X/Open TX interface and the UserTransaction.commit() in JTA don't return such an error, IIRC. Do TX_FAIL and SystemException serve such a purpose? I don't feel like that.

[Tuxedo manual (Japanese)]
https://docs.oracle.com/cd/F25597_01/document/products/tuxedo/tux80j/atmi/rf3c91.htm

[JTA]
public interface javax.transaction.UserTransaction
public void commit()
throws RollbackException, HeuristicMixedException,
HeuristicRollbackException, SecurityException,
IllegalStateException, SystemException

Throws: RollbackException
Thrown to indicate that the transaction has been rolled back rather than committed.

Throws: HeuristicMixedException
Thrown to indicate that a heuristic decision was made and that some relevant updates have been
committed while others have been rolled back.

Throws: HeuristicRollbackException
Thrown to indicate that a heuristic decision was made and that all relevant updates have been rolled
back.

Throws: SecurityException
Thrown to indicate that the thread is not allowed to commit the transaction.

Throws: IllegalStateException
Thrown if the current thread is not associated with a transaction.

Throws: SystemException
Thrown if the transaction manager encounters an unexpected error condition.

Regards
Takayuki Tsunakawa

#250Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#249)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <sawada.mshk@gmail.com>

I think we should not reinterpret the severity of the error and lower
it. Especially, in this case, any kind of errors can be thrown. It
could be such a serious error that FDW developer wants to report to
the client. Do we lower even PANIC to a lower severity such as
WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
lowering ERROR (and FATAL) to WARNING, why do we regard only them as
non-error?

Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit?

It's not necessarily on a remote server. It could be a problem with
the local server.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#251Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#249)
Re: Transactions involving multiple postgres foreign servers, take 2

(I have caught up here. Sorry in advance for possible pointless
discussion by me..)

At Tue, 8 Jun 2021 00:47:08 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Masahiko Sawada <sawada.mshk@gmail.com>

I think we should not reinterpret the severity of the error and lower
it. Especially, in this case, any kind of errors can be thrown. It
could be such a serious error that FDW developer wants to report to
the client. Do we lower even PANIC to a lower severity such as
WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
lowering ERROR (and FATAL) to WARNING, why do we regard only them as
non-error?

Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit?

I think the discussion is based the behavior that any process that is
responsible for finishing the 2pc-commit continue retrying remote
commits until all of the remote-commits succeed.

Maybe in most cases the errors duing remote-prepared-commit could be
retry-able but as Sawada-san says I'm also not sure it's always the
case. On the other hand, it could be said that we have no other way
than retrying the remote-commits if we want to get over, say, instant
network failures automatically. It is somewhat similar to
WAL-restoration that continues complaining for recovery_commands
failure without exiting.

FYI, the tx_commit() in the X/Open TX interface and the UserTransaction.commit() in JTA don't return such an error, IIRC. Do TX_FAIL and SystemException serve such a purpose? I don't feel like that.

I'm not sure about how JTA works in detail, but doesn't
UserTransaction.commit() return HeuristicMixedExcpetion when some of
relevant updates have been committed but other not? Isn't it the same
state with the case where some of the remote servers failed on
remote-commit while others are succeeded? (I guess that
UserTransaction.commit() would throw RollbackException if
remote-prepare has been failed for any of the remotes.)

[Tuxedo manual (Japanese)]
https://docs.oracle.com/cd/F25597_01/document/products/tuxedo/tux80j/atmi/rf3c91.htm

[JTA]
public interface javax.transaction.UserTransaction
public void commit()
throws RollbackException, HeuristicMixedException,
HeuristicRollbackException, SecurityException,
IllegalStateException, SystemException

Throws: RollbackException
Thrown to indicate that the transaction has been rolled back rather than committed.

Throws: HeuristicMixedException
Thrown to indicate that a heuristic decision was made and that some relevant updates have been
committed while others have been rolled back.

Throws: HeuristicRollbackException
Thrown to indicate that a heuristic decision was made and that all relevant updates have been rolled
back.

Throws: SecurityException
Thrown to indicate that the thread is not allowed to commit the transaction.

Throws: IllegalStateException
Thrown if the current thread is not associated with a transaction.

Throws: SystemException
Thrown if the transaction manager encounters an unexpected error condition.

Regards
Takayuki Tsunakawa

--
Kyotaro Horiguchi
NTT Open Source Software Center

#252Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#250)
Re: Transactions involving multiple postgres foreign servers, take 2

At Tue, 8 Jun 2021 16:32:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <sawada.mshk@gmail.com>

I think we should not reinterpret the severity of the error and lower
it. Especially, in this case, any kind of errors can be thrown. It
could be such a serious error that FDW developer wants to report to
the client. Do we lower even PANIC to a lower severity such as
WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
lowering ERROR (and FATAL) to WARNING, why do we regard only them as
non-error?

Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit?

It's not necessarily on a remote server. It could be a problem with
the local server.

Isn't it a discussion about the errors from postgres_fdw?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#253tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#250)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada.mshk@gmail.com>

On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Why does the client have to know the error on a remote server, whereas the

global transaction itself is destined to commit?

It's not necessarily on a remote server. It could be a problem with
the local server.

Then, in what kind of scenario are we talking about the difficulty, and how is it difficult to handle, when we adopt either the method 1 or 2? (I'd just like to have the same clear picture.) For example,

1. All FDWs prepared successfully.
2. The local transaction prepared successfully, too.
3. Some FDWs committed successfully.
4. One FDW failed to send the commit request because the remote server went down.

Regards
Takayuki Tsunakawa

#254tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#251)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

I think the discussion is based the behavior that any process that is
responsible for finishing the 2pc-commit continue retrying remote
commits until all of the remote-commits succeed.

Thank you for coming back. We're talking about the first attempt to prepare and commit in each transaction, not the retry case.

Throws: HeuristicMixedException
Thrown to indicate that a heuristic decision was made and that some

relevant updates have been

committed while others have been rolled back.

I'm not sure about how JTA works in detail, but doesn't
UserTransaction.commit() return HeuristicMixedExcpetion when some of
relevant updates have been committed but other not? Isn't it the same
state with the case where some of the remote servers failed on
remote-commit while others are succeeded?

No. Taking the description literally and considering the relevant XA specification, it's not about the remote commit failure. The remote server is not allowed to fail the commit once it has reported successful prepare, which is the contract of 2PC. HeuristicMixedException is about the manual resolution, typically by the DBA, using the DBMS-specific tool or the standard commit()/rollback() API.

(I guess that
UserTransaction.commit() would throw RollbackException if
remote-prepare has been failed for any of the remotes.)

Correct.

Regards
Takayuki Tsunakawa

#255Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#253)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <sawada.mshk@gmail.com>

On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Why does the client have to know the error on a remote server, whereas the

global transaction itself is destined to commit?

It's not necessarily on a remote server. It could be a problem with
the local server.

Then, in what kind of scenario are we talking about the difficulty, and how is it difficult to handle, when we adopt either the method 1 or 2? (I'd just like to have the same clear picture.)

IMO, even though FDW's commit/rollback transaction code could be
simple in some cases, I think we need to think that any kind of errors
(or even FATAL or PANIC) could be thrown from the FDW code. It could
be an error due to a temporary network problem, remote server down,
driver’s unexpected error, or out of memory etc. Errors that happened
after the local transaction commit doesn't affect the global
transaction decision, as you mentioned. But the proccess or system
could be in a bad state. Also, users might expect the process to exit
on error by setting exit_on_error = on. Your idea sounds like that we
have to ignore any errors happening after the local commit if they
don’t affect the transaction outcome. It’s too scary to me and I think
that it's a bad idea to blindly ignore all possible errors under such
conditions. That could make the thing worse and will likely be
foot-gun. It would be good if we can prove that it’s safe to ignore
those errors but not sure how we can at least for me.

This situation is true even today; an error could happen after
committing the transaction. But I personally don’t want to add the
code that increases the likelihood.

Just to be clear, with your idea, we will ignore only ERROR or also
FATAL and PANIC? And if an error happens during committing one of the
prepared transactions on the foreign server, will we proceed with
committing other transactions or return OK to the client?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#256tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#255)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada.mshk@gmail.com>

On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Then, in what kind of scenario are we talking about the difficulty, and how is

it difficult to handle, when we adopt either the method 1 or 2? (I'd just like to
have the same clear picture.)

IMO, even though FDW's commit/rollback transaction code could be
simple in some cases, I think we need to think that any kind of errors
(or even FATAL or PANIC) could be thrown from the FDW code. It could
be an error due to a temporary network problem, remote server down,
driver’s unexpected error, or out of memory etc. Errors that happened
after the local transaction commit doesn't affect the global
transaction decision, as you mentioned. But the proccess or system
could be in a bad state. Also, users might expect the process to exit
on error by setting exit_on_error = on. Your idea sounds like that we
have to ignore any errors happening after the local commit if they
don’t affect the transaction outcome. It’s too scary to me and I think
that it's a bad idea to blindly ignore all possible errors under such
conditions. That could make the thing worse and will likely be
foot-gun. It would be good if we can prove that it’s safe to ignore
those errors but not sure how we can at least for me.

This situation is true even today; an error could happen after
committing the transaction. But I personally don’t want to add the
code that increases the likelihood.

I'm not talking about the code simplicity here (actually, I haven't reviewed the code around prepare and commit in the patch yet...) Also, I don't understand well what you're trying to insist and what realistic situations you have in mind by citing exit_on_error, FATAL, PANIC and so on. I just asked (in a different part) why the client has to know the error.

Just to be clear, I'm not saying that we should hide the error completely behind the scenes. For example, you can allow the FDW to emit a WARNING if the DBMS-specific client driver returns an error when committing. Further, if you want to allow the FDW to throw an ERROR when committing, the transaction manager in core can catch it by PG_TRY(), so that it can report back successfull commit of the global transaction to the client while it leaves the handling of failed commit of the FDW to the resolver. (I don't think we like to use PG_TRY() during transaction commit for performance reasons, though.)

Let's give it a hundred steps and let's say we want to report the error of the committing FDW to the client. If that's the case, we can use SQLSTATE 02xxx (Warning) and attach the error message.

Just to be clear, with your idea, we will ignore only ERROR or also
FATAL and PANIC? And if an error happens during committing one of the
prepared transactions on the foreign server, will we proceed with
committing other transactions or return OK to the client?

Neither FATAL nor PANIC can be ignored. When FATAL, which means the termination of a particular session, the committing of the remote transaction should be taken over by the resolver. Not to mention PANIC; we can't do anything. Otherwise, we proceed with committing other FDWs, hand off the task of committing the failed FDW to the resolver, and report success to the client. If you're not convinced, I'd like to ask you to investigate the code of some Java EE app server, say GlassFish, and share with us how it handles an error during commit.

Regards
Takayuki Tsunakawa

#257Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#256)
Re: Transactions involving multiple postgres foreign servers, take 2

On Wed, Jun 9, 2021 at 4:10 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Masahiko Sawada <sawada.mshk@gmail.com>

On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Then, in what kind of scenario are we talking about the difficulty, and how is

it difficult to handle, when we adopt either the method 1 or 2? (I'd just like to
have the same clear picture.)

IMO, even though FDW's commit/rollback transaction code could be
simple in some cases, I think we need to think that any kind of errors
(or even FATAL or PANIC) could be thrown from the FDW code. It could
be an error due to a temporary network problem, remote server down,
driver’s unexpected error, or out of memory etc. Errors that happened
after the local transaction commit doesn't affect the global
transaction decision, as you mentioned. But the proccess or system
could be in a bad state. Also, users might expect the process to exit
on error by setting exit_on_error = on. Your idea sounds like that we
have to ignore any errors happening after the local commit if they
don’t affect the transaction outcome. It’s too scary to me and I think
that it's a bad idea to blindly ignore all possible errors under such
conditions. That could make the thing worse and will likely be
foot-gun. It would be good if we can prove that it’s safe to ignore
those errors but not sure how we can at least for me.

This situation is true even today; an error could happen after
committing the transaction. But I personally don’t want to add the
code that increases the likelihood.

I'm not talking about the code simplicity here (actually, I haven't reviewed the code around prepare and commit in the patch yet...) Also, I don't understand well what you're trying to insist and what realistic situations you have in mind by citing exit_on_error, FATAL, PANIC and so on. I just asked (in a different part) why the client has to know the error.

Just to be clear, I'm not saying that we should hide the error completely behind the scenes. For example, you can allow the FDW to emit a WARNING if the DBMS-specific client driver returns an error when committing. Further, if you want to allow the FDW to throw an ERROR when committing, the transaction manager in core can catch it by PG_TRY(), so that it can report back successfull commit of the global transaction to the client while it leaves the handling of failed commit of the FDW to the resolver. (I don't think we like to use PG_TRY() during transaction commit for performance reasons, though.)

Let's give it a hundred steps and let's say we want to report the error of the committing FDW to the client. If that's the case, we can use SQLSTATE 02xxx (Warning) and attach the error message.

Maybe it's better to start a new thread to discuss this topic. If your
idea is good, we can lower all error that happened after writing the
commit record to warning, reducing the cases where the client gets
confusion by receiving an error after the commit.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#258tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Masahiko Sawada (#257)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada.mshk@gmail.com>

Maybe it's better to start a new thread to discuss this topic. If your
idea is good, we can lower all error that happened after writing the
commit record to warning, reducing the cases where the client gets
confusion by receiving an error after the commit.

No. It's an important part because it determines the 2PC behavior and performance. This discussion had started from the concern about performance before Ikeda-san reported pathological results. Don't rush forward, hoping someone will commit the current patch. I'm afraid you just don't want to change your design and code. Let's face the real issue.

As I said before, and as Ikeda-san's performance benchmark results show, I have to say the design isn't done sufficiently. I talked with Fujii-san the other day about this patch. The patch is already huge and it's difficult to decode how the patch works, e.g., what kind of new WALs it emits, how many disk writes it adds, how the error is handled, whether/how it's different from the textbook or other existing designs, etc. What happend to my request to add such design description to the following page, so that reviewers can consider the design before spending much time on looking at the code? What's the situation of the new FDW API that should naturally accommodate other FDW implementations?

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Design should come first. I don't think it's a sincere attitude to require reviewers to spend long time to read the design from huge code.

Regards
Takayuki Tsunakawa

#259Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#254)
Re: Transactions involving multiple postgres foreign servers, take 2

At Tue, 8 Jun 2021 08:45:24 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

I think the discussion is based the behavior that any process that is
responsible for finishing the 2pc-commit continue retrying remote
commits until all of the remote-commits succeed.

Thank you for coming back. We're talking about the first attempt to prepare and commit in each transaction, not the retry case.

If we accept each elementary-commit (via FDW connection) to fail, the
parent(?) there's no way the root 2pc-commit can succeed. How can we
ignore the fdw-error in that case?

Throws: HeuristicMixedException
Thrown to indicate that a heuristic decision was made and that some

relevant updates have been

committed while others have been rolled back.

I'm not sure about how JTA works in detail, but doesn't
UserTransaction.commit() return HeuristicMixedExcpetion when some of
relevant updates have been committed but other not? Isn't it the same
state with the case where some of the remote servers failed on
remote-commit while others are succeeded?

No. Taking the description literally and considering the relevant XA specification, it's not about the remote commit failure. The remote server is not allowed to fail the commit once it has reported successful prepare, which is the contract of 2PC. HeuristicMixedException is about the manual resolution, typically by the DBA, using the DBMS-specific tool or the standard commit()/rollback() API.

Mmm. The above seems as if saying that 2pc-comit does not interact
with remotes. The interface contract does not cover everything that
happens in the real world. If remote-commit fails, that is just an
issue outside of the 2pc world. In reality remote-commit may fail for
all reasons.

https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distributed-transaction-that-uses-jta-methods

} catch (javax.transaction.xa.XAException xae)
{ // Distributed transaction failed, so roll it back.
// Report XAException on prepare/commit.

This suggests that both XAResoruce.prepare() and commit() can throw a
exception.

(I guess that
UserTransaction.commit() would throw RollbackException if
remote-prepare has been failed for any of the remotes.)

Correct.

So UserTransaction.commit() does not throw the same exception if
remote-commit fails. Isn't the HeuristicMixedExcpetion the exception
thrown in that case?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#260tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#259)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

If we accept each elementary-commit (via FDW connection) to fail, the
parent(?) there's no way the root 2pc-commit can succeed. How can we
ignore the fdw-error in that case?

No, we don't ignore the error during FDW commit. As mentioned at the end of this mail, the question is how the FDW reports the eror to the caller (transaction manager in Postgres core), and how we should handle it.

As below, Glassfish catches the resource manager's error during commit, retries the commit if the error is transient or communication failure, and hands off the processing of failed commit to the recovery manager. (I used all of my energy today; I'd be grateful if someone could figure out whether Glassfish reports the error to the application.)

[XATerminatorImpl.java]
public void commit(Xid xid, boolean onePhase) throws XAException {
...
} else {
coord.commit();
}

[TopCoordinator.java]
// Commit all participants. If a fatal error occurs during
// this method, then the process must be ended with a fatal error.
...
try {
participants.distributeCommit();
} catch (Throwable exc) {

[RegisteredResources.java]
void distributeCommit() throws HeuristicMixed, HeuristicHazard, NotPrepared {
...
// Browse through the participants, committing them. The following is
// intended to be done asynchronously as a group of operations.
...
// Tell the resource to commit.
// Catch any exceptions here; keep going until
// no exception is left.
...
// If the exception is neither TRANSIENT or
// COMM_FAILURE, it is unexpected, so display a
// message and give up with this Resource.
...
// For TRANSIENT or COMM_FAILURE, wait
// for a while, then retry the commit.
...
// If the retry limit has been exceeded,
// end the process with a fatal error.
...
if (!transactionCompleted) {
if (coord != null)
RecoveryManager.addToIncompleTx(coord, true);

No. Taking the description literally and considering the relevant XA

specification, it's not about the remote commit failure. The remote server is
not allowed to fail the commit once it has reported successful prepare, which is
the contract of 2PC. HeuristicMixedException is about the manual resolution,
typically by the DBA, using the DBMS-specific tool or the standard
commit()/rollback() API.

Mmm. The above seems as if saying that 2pc-comit does not interact
with remotes. The interface contract does not cover everything that
happens in the real world. If remote-commit fails, that is just an
issue outside of the 2pc world. In reality remote-commit may fail for
all reasons.

The following part of XA specification is relevant. We're considering to model the FDW 2PC interface based on XA, because it seems like the only standard interface and thus other FDWS would naturally take advantage of, aren't we? Then, we need to take care of such things as this. The interface design is not easy. So, proper design and its review should come first, before going deeper into the huge code patch.

2.3.3 Heuristic Branch Completion
--------------------------------------------------
Some RMs may employ heuristic decision-making: an RM that has prepared to
commit a transaction branch may decide to commit or roll back its work independently
of the TM. It could then unlock shared resources. This may leave them in an
inconsistent state. When the TM ultimately directs an RM to complete the branch, the
RM may respond that it has already done so. The RM reports whether it committed
the branch, rolled it back, or completed it with mixed results (committed some work
and rolled back other work).

An RM that reports heuristic completion to the TM must not discard its knowledge of
the transaction branch. The TM calls the RM once more to authorise it to forget the
branch. This requirement means that the RM must notify the TM of all heuristic
decisions, even those that match the decision the TM requested. The referenced
OSI DTP specifications (model) and (service) define heuristics more precisely.
--------------------------------------------------

https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distr
ibuted-transaction-that-uses-jta-methods
This suggests that both XAResoruce.prepare() and commit() can throw a
exception.

Yes, XAResource() can throw an exception:

void commit(Xid xid, boolean onePhase) throws XAException

Throws: XAException
An error has occurred. Possible XAExceptions are XA_HEURHAZ, XA_HEURCOM,
XA_HEURRB, XA_HEURMIX, XAER_RMERR, XAER_RMFAIL, XAER_NOTA,
XAER_INVAL, or XAER_PROTO.

This is equivalent to xa_commit() in the XA specification. xa_commit() can return an error code that have the same names as above.

The question we're trying to answer here is:

* How such an error should be handled?
Glassfish (and possibly other Java EE servers) catch the error, continue to commit the rest of participants, and handle the failed resource manager's commit in the background. In Postgres, if we allow FDWs to do ereport(ERROR), how can we do similar things?

* Should we report the error to the client? If yes, should it be reported as a failure of commit, or as an informational message (WARNING) of a successful commit? Why does the client want to know the error, where the global transaction's commit has been promised?

Regards
Takayuki Tsunakawa

#261Robert Haas
robertmhaas@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#241)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 4, 2021 at 4:04 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Why does the user have to get an error? Once the local transaction has been prepared, which means all remote ones also have been prepared, the whole transaction is determined to commit. So, the user doesn't have to receive an error as long as the local node is alive.

That is completely unrealistic. As Sawada-san has pointed out
repeatedly, there are tons of things that can go wrong even after the
remote side has prepared the transaction. Preparing a transaction only
promises that the remote side will let you commit the transaction upon
request. It doesn't guarantee that you'll be able to make the request.
Like Sawada-san says, network problems, out of memory issues, or many
other things could stop that from happening. Someone could come along
in another session and run "ROLLBACK PREPARED" on the remote side, and
now the "COMMIT PREPARED" will never succeed no matter how many times
you try it. At least, not unless someone goes and creates a new
prepared transaction with the same 2PC identifier, but then you won't
be committing the correct transaction anyway. Or someone could take
the remote server and drop it in a volcano. How do you propose that we
avoid giving the user an error after the remote server has been
dropped into a volcano, even though the local node is still alive?

Also, leaving aside theoretical arguments, I think it's not
realistically possible for an FDW author to write code to commit a
prepared transaction that will be safe in the context of running late
in PrepareTransaction(), after we've already done
RecordTransactionCommit(). Such code can't avoid throwing errors
because it can't avoid performing operations and allocating memory.
It's already been mentioned that, if an ERROR is thrown, it would be
reported to the user in place of the COMMIT acknowledgement that they
are expecting. Now, it has also been suggested that we could downgrade
the ERROR to a WARNING and still report the COMMIT. That doesn't sound
easy to do, because when the ERROR happens, control is going to jump
to AbortTransaction(). But even if you could hack it so it works like
that, it doesn't really solve the problem. What about all of the other
servers where the prepared transaction also needs to be committed? In
the design of PostgreSQL, in all circumstances, the way you recover
from an error is to abort the transaction. That is what brings the
system back to a clean state. You can't simply ignore the requirement
to abort the transaction and keep doing more work. It will never be
reliable, and Tom will instantaneously demand that any code works like
that be reverted -- and for good reason.

I am not sure that it's 100% impossible to find a way to solve this
problem without just having the resolver do all the work, but I think
it's going to be extremely difficult. We tried to figure out some
vaguely similar things while working on undo, and it really didn't go
very well. The later stages of CommitTransaction() and
AbortTransaction() are places where very few kinds of code are safe to
execute, and finding a way to patch around that problem is not simple
either. If the resolver performance is poor, perhaps we could try to
find a way to improve it. I don't know. But I don't think it does any
good to say, well, no errors can occur after the remote transaction is
prepared. That's clearly incorrect.

--
Robert Haas
EDB: http://www.enterprisedb.com

#262tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Robert Haas (#261)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Robert Haas <robertmhaas@gmail.com>

That is completely unrealistic. As Sawada-san has pointed out
repeatedly, there are tons of things that can go wrong even after the
remote side has prepared the transaction. Preparing a transaction only
promises that the remote side will let you commit the transaction upon
request. It doesn't guarantee that you'll be able to make the request.
Like Sawada-san says, network problems, out of memory issues, or many
other things could stop that from happening. Someone could come along
in another session and run "ROLLBACK PREPARED" on the remote side, and
now the "COMMIT PREPARED" will never succeed no matter how many times
you try it. At least, not unless someone goes and creates a new
prepared transaction with the same 2PC identifier, but then you won't
be committing the correct transaction anyway. Or someone could take
the remote server and drop it in a volcano. How do you propose that we
avoid giving the user an error after the remote server has been
dropped into a volcano, even though the local node is still alive?

I understand that. As I cited yesterday and possibly before, that's why xa_commit() returns various return codes. So, I have never suggested that FDWs should not report an error and always report success for the commit request. They should be allowed to report an error.

The question I have been asking is how. With that said, we should only have two options; one is the return value of the FDW commit routine, and the other is via ereport(ERROR). I suggested the possibility of the former, because if the FDW does ereport(ERROR), Postgres core (transaction manager) may have difficulty in handling the rest of the participants.

Also, leaving aside theoretical arguments, I think it's not
realistically possible for an FDW author to write code to commit a
prepared transaction that will be safe in the context of running late
in PrepareTransaction(), after we've already done
RecordTransactionCommit(). Such code can't avoid throwing errors
because it can't avoid performing operations and allocating memory.

I'm not completely sure about this. I thought (and said) that the only thing the FDW does would be to send a commit request through an existing connection. So, I think it's not a severe restriction to require FDWs to do ereport(ERROR) during commits (of the second phase of 2PC.)

It's already been mentioned that, if an ERROR is thrown, it would be
reported to the user in place of the COMMIT acknowledgement that they
are expecting. Now, it has also been suggested that we could downgrade
the ERROR to a WARNING and still report the COMMIT. That doesn't sound
easy to do, because when the ERROR happens, control is going to jump
to AbortTransaction(). But even if you could hack it so it works like
that, it doesn't really solve the problem. What about all of the other
servers where the prepared transaction also needs to be committed? In
the design of PostgreSQL, in all circumstances, the way you recover
from an error is to abort the transaction. That is what brings the
system back to a clean state. You can't simply ignore the requirement
to abort the transaction and keep doing more work. It will never be
reliable, and Tom will instantaneously demand that any code works like
that be reverted -- and for good reason.

(I took "abort" as the same as "rollback" here.) Once we've sent commit requests to some participants, we can't abort the transaction. If one FDW returned an error halfway, we need to send commit requests to the rest of participants.

It's a design question, as I repeatedly said, whether and how we should report the error of some participants to the client. For instance, how should we report the errors of multiple participants? Concatenate those error messages?

Anyway, we should design the interface first, giving much thought and respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS). Otherwise, we may end up like "We implemented like this, so the interface is like this and it can only behave like this, although you may find it strange..." That might be a situation similar to what your comment "the design of PostgreSQL, in all circumstances, the way you recover from an error is to abort the transaction" suggests -- Postgres doesn't have statement-level rollback.

I am not sure that it's 100% impossible to find a way to solve this
problem without just having the resolver do all the work, but I think
it's going to be extremely difficult. We tried to figure out some
vaguely similar things while working on undo, and it really didn't go
very well. The later stages of CommitTransaction() and
AbortTransaction() are places where very few kinds of code are safe to
execute, and finding a way to patch around that problem is not simple
either. If the resolver performance is poor, perhaps we could try to
find a way to improve it. I don't know. But I don't think it does any
good to say, well, no errors can occur after the remote transaction is
prepared. That's clearly incorrect.

I don't think the resolver-based approach would bring us far enough. It's fundamentally a bottleneck. Such a background process should only handle commits whose requests failed to be sent due to server down.

My requests are only twofold and haven't changed for long: design the FDW interface that implementors can naturally follow, and design to ensure performance.

Regards
Takayuki Tsunakawa

#263Robert Haas
robertmhaas@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#262)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

I understand that. As I cited yesterday and possibly before, that's why xa_commit() returns various return codes. So, I have never suggested that FDWs should not report an error and always report success for the commit request. They should be allowed to report an error.

In the text to which I was responding it seemed like you were saying
the opposite. Perhaps I misunderstood.

The question I have been asking is how. With that said, we should only have two options; one is the return value of the FDW commit routine, and the other is via ereport(ERROR). I suggested the possibility of the former, because if the FDW does ereport(ERROR), Postgres core (transaction manager) may have difficulty in handling the rest of the participants.

I don't think that is going to work. It is very difficult to write
code that doesn't ever ERROR in PostgreSQL. It is not impossible if
the operation is trivial enough, but I think you're greatly
underestimating the complexity of committing the remote transaction.
If somebody had designed PostgreSQL so that every function returns a
return code and every time you call some other function you check that
return code and pass any error up to your own caller, then there would
be no problem here. But in fact the design was that at the first sign
of trouble you throw an ERROR. It's not easy to depart from that
programming model in just one place.

Also, leaving aside theoretical arguments, I think it's not
realistically possible for an FDW author to write code to commit a
prepared transaction that will be safe in the context of running late
in PrepareTransaction(), after we've already done
RecordTransactionCommit(). Such code can't avoid throwing errors
because it can't avoid performing operations and allocating memory.

I'm not completely sure about this. I thought (and said) that the only thing the FDW does would be to send a commit request through an existing connection. So, I think it's not a severe restriction to require FDWs to do ereport(ERROR) during commits (of the second phase of 2PC.)

To send a commit request through an existing connection, you have to
send some bytes over the network using a send() or write() system
call. That can fail. Then you have to read the response back over the
network using recv() or read(). That can also fail. You also need to
parse the result that you get from the remote side, which can also
fail, because you could get back garbage for some reason. And
depending on the details, you might first need to construct the
message you're going to send, which might be able to fail too. Also,
the data might be encrypted using SSL, so you might have to decrypt
it, which can also fail, and you might need to encrypt data before
sending it, which can fail. In fact, if you're using the OpenSSL,
trying to call SSL_read() or SSL_write() can both read and write data
from the socket, even multiple times, so you have extra opportunities
to fail.

(I took "abort" as the same as "rollback" here.) Once we've sent commit requests to some participants, we can't abort the transaction. If one FDW returned an error halfway, we need to send commit requests to the rest of participants.

I understand that it's not possible to abort the local transaction to
abort after it's been committed, but that doesn't mean that we're
going to be able to send the commit requests to the rest of the
participants. We want to be able to do that, certainly, but there's no
guarantee that it's actually possible. Again, the remote servers may
be dropped into a volcano, or less seriously, we may not be able to
access them. Also, someone may kill off our session.

It's a design question, as I repeatedly said, whether and how we should report the error of some participants to the client. For instance, how should we report the errors of multiple participants? Concatenate those error messages?

Sure, I agree that there are some questions about how to report errors.

Anyway, we should design the interface first, giving much thought and respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS). Otherwise, we may end up like "We implemented like this, so the interface is like this and it can only behave like this, although you may find it strange..." That might be a situation similar to what your comment "the design of PostgreSQL, in all circumstances, the way you recover from an error is to abort the transaction" suggests -- Postgres doesn't have statement-level rollback.

I think that's a valid concern, but we also have to have a plan that
is realistic. Some things are indeed not possible in PostgreSQL's
design. Also, some of these problems are things everyone has to
somehow confront. There's no database doing 2PC that can't have a
situation where one of the machines disappears unexpectedly due to
some natural disaster or administrator interference. It might be the
case that our inability to do certain things safely during transaction
commit puts us out of compliance with the spec, but it can't be the
case that some other system has no possible failures during
transaction commit. The problem of the network potentially being
disconnected between one packet and the next exists in every system.

I don't think the resolver-based approach would bring us far enough. It's fundamentally a bottleneck. Such a background process should only handle commits whose requests failed to be sent due to server down.

Why is it fundamentally a bottleneck? It seems to me in some cases it
could scale better than any other approach. If we have to commit on
100 shards in only one process we can only do those commits one at a
time. If we can use resolver processes we could do all 100 at once if
the user can afford to run that many resolvers, which should be way
faster. It is true that if the resolver does not have a connection
open and must open one, that might be slow, but presumably after that
it can keep the connection open and reuse it for subsequent
distributed transactions. I don't really see why that should be
particularly slow.

--
Robert Haas
EDB: http://www.enterprisedb.com

#264Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#230)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Thanks for updating the patches!

I'm still reading these patches, but I'd like to share some review comments
that I found so far.

(1)
+/* Remove the foreign transaction from FdwXactParticipants */
+void
+FdwXactUnregisterXact(UserMapping *usermapping)
+{
+	Assert(IsTransactionState());
+	RemoveFdwXactEntry(usermapping->umid);
+}

Currently there is no user of FdwXactUnregisterXact().
This function should be removed?

(2)
When I ran the regression test, I got the following failure.

========= Contents of ./src/test/modules/test_fdwxact/regression.diffs
diff -U3 /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
--- /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out	2021-06-10 02:19:43.808622747 +0000
+++ /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out	2021-06-10 02:29:53.452410462 +0000
@@ -174,7 +174,7 @@
  SELECT count(*) FROM pg_foreign_xacts;
   count
  -------
-     1
+     4
  (1 row)
(3)
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						(uint32) (lsn >> 32),
+						(uint32) lsn)));

LSN_FORMAT_ARGS() should be used?

(4)
+extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
+								int len);

Since RecreateFdwXactFile() is used only in fdwxact.c,
the above "extern" is not necessary?

(5)
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.

So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
for WAL record to be replicated to the standby if sync replication is enabled?
Otherwise, when the failover happens, new primary (past-standby)
might not have enough XLOG_FDWXACT_INSERT WAL records and
might fail to find some in-doubt foreign transactions.

(6)
XLogFlush() is called for each foreign transaction. So if there are many
foreign transactions, XLogFlush() is called too frequently. Which might
cause unnecessary performance overhead? Instead, for example,
we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions()
after inserting all WAL records for all foreign transactions?

(7)
/* Open connection; report that we'll create a prepared statement. */
fmstate->conn = GetConnection(user, true, &fmstate->conn_state);
+ MarkConnectionModified(user);

MarkConnectionModified() should be called also when TRUNCATE on
a foreign table is executed?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#265tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Robert Haas (#263)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Robert Haas <robertmhaas@gmail.com>

On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

The question I have been asking is how. With that said, we should only have

two options; one is the return value of the FDW commit routine, and the other is
via ereport(ERROR). I suggested the possibility of the former, because if the
FDW does ereport(ERROR), Postgres core (transaction manager) may have
difficulty in handling the rest of the participants.

I don't think that is going to work. It is very difficult to write
code that doesn't ever ERROR in PostgreSQL. It is not impossible if
the operation is trivial enough, but I think you're greatly
underestimating the complexity of committing the remote transaction.
If somebody had designed PostgreSQL so that every function returns a
return code and every time you call some other function you check that
return code and pass any error up to your own caller, then there would
be no problem here. But in fact the design was that at the first sign
of trouble you throw an ERROR. It's not easy to depart from that
programming model in just one place.

I'm not completely sure about this. I thought (and said) that the only thing

the FDW does would be to send a commit request through an existing
connection. So, I think it's not a severe restriction to require FDWs to do
ereport(ERROR) during commits (of the second phase of 2PC.)

To send a commit request through an existing connection, you have to
send some bytes over the network using a send() or write() system
call. That can fail. Then you have to read the response back over the
network using recv() or read(). That can also fail. You also need to
parse the result that you get from the remote side, which can also
fail, because you could get back garbage for some reason. And
depending on the details, you might first need to construct the
message you're going to send, which might be able to fail too. Also,
the data might be encrypted using SSL, so you might have to decrypt
it, which can also fail, and you might need to encrypt data before
sending it, which can fail. In fact, if you're using the OpenSSL,
trying to call SSL_read() or SSL_write() can both read and write data
from the socket, even multiple times, so you have extra opportunities
to fail.

I know sending a commit request may get an error from various underlying functions, but we're talking about the client side, not the Postgres's server side that could unexpectedly ereport(ERROR) somewhere. So, the new FDW commit routine won't lose control and can return an error code as its return value. For instance, the FDW commit routine for DBMS-X would typically be:

int
DBMSXCommit(...)
{
int ret;

/* extract info from the argument to pass to xa_commit() */

ret = DBMSX_xa_commit(...);
/* This is the actual commit function which is exposed to the app server (e.g. Tuxedo) through the xa_commit() interface */

/* map xa_commit() return values to the corresponding return values of the FDW commit routine */
switch (ret)
{
case XA_RMERR:
ret = ...;
break;
...
}

return ret;
}

I think that's a valid concern, but we also have to have a plan that
is realistic. Some things are indeed not possible in PostgreSQL's
design. Also, some of these problems are things everyone has to
somehow confront. There's no database doing 2PC that can't have a
situation where one of the machines disappears unexpectedly due to
some natural disaster or administrator interference. It might be the
case that our inability to do certain things safely during transaction
commit puts us out of compliance with the spec, but it can't be the
case that some other system has no possible failures during
transaction commit. The problem of the network potentially being
disconnected between one packet and the next exists in every system.

So, we need to design how commit behaves from the user's perspective. That's the functional design. We should figure out what's the desirable response of commit first, and then see if we can implement it or have to compromise in some way. I think we can reference the X/Open TX standard and/or JTS (Java Transaction Service) specification (I haven't had a chance to read them yet, though.) Just in case we can't find the requested commit behavior in the volcano case from those specifications, ... (I'm hesitant to say this because it may be hard,) it's desirable to follow representative products such as Tuxedo and GlassFish (the reference implementation of Java EE specs.)

I don't think the resolver-based approach would bring us far enough. It's

fundamentally a bottleneck. Such a background process should only handle
commits whose requests failed to be sent due to server down.

Why is it fundamentally a bottleneck? It seems to me in some cases it
could scale better than any other approach. If we have to commit on
100 shards in only one process we can only do those commits one at a
time. If we can use resolver processes we could do all 100 at once if
the user can afford to run that many resolvers, which should be way
faster. It is true that if the resolver does not have a connection
open and must open one, that might be slow, but presumably after that
it can keep the connection open and reuse it for subsequent
distributed transactions. I don't really see why that should be
particularly slow.

Concurrent transactions are serialized at the resolver. I heard that the current patch handles 2PC like this: the TM (transaction manager in Postgres core) requests prepare to the resolver, the resolver sends prepare to the remote server and wait for reply, the TM gets back control from the resolver, TM requests commit to the resolver, the resolver sends commit to the remote server and wait for reply, and TM gets back control. The resolver handles one transaction at a time.

In regard to the case where one session has to commit on multiple remote servers, we're talking about the asynchronous interface just like what the XA standard provides.

Regards
Takayuki Tsunakawa

#266Robert Haas
robertmhaas@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#265)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sun, Jun 13, 2021 at 10:04 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

I know sending a commit request may get an error from various underlying functions, but we're talking about the client side, not the Postgres's server side that could unexpectedly ereport(ERROR) somewhere. So, the new FDW commit routine won't lose control and can return an error code as its return value. For instance, the FDW commit routine for DBMS-X would typically be:

int
DBMSXCommit(...)
{
int ret;

/* extract info from the argument to pass to xa_commit() */

ret = DBMSX_xa_commit(...);
/* This is the actual commit function which is exposed to the app server (e.g. Tuxedo) through the xa_commit() interface */

/* map xa_commit() return values to the corresponding return values of the FDW commit routine */
switch (ret)
{
case XA_RMERR:
ret = ...;
break;
...
}

return ret;
}

Well, we're talking about running this commit routine from within
CommitTransaction(), right? So I think it is in fact running in the
server. And if that's so, then you have to worry about how to make it
respond to interrupts. You can't just call some functions
DBMSX_xa_commit() and wait for infinite time for it to return. Look at
pgfdw_get_result() for an example of what real code to do this looks
like.

So, we need to design how commit behaves from the user's perspective. That's the functional design. We should figure out what's the desirable response of commit first, and then see if we can implement it or have to compromise in some way. I think we can reference the X/Open TX standard and/or JTS (Java Transaction Service) specification (I haven't had a chance to read them yet, though.) Just in case we can't find the requested commit behavior in the volcano case from those specifications, ... (I'm hesitant to say this because it may be hard,) it's desirable to follow representative products such as Tuxedo and GlassFish (the reference implementation of Java EE specs.)

Honestly, I am not quite sure what any specification has to say about
this. We're talking about what happens when a user does something with
a foreign table and then type COMMIT. That's all about providing a set
of behaviors that are consistent with how PostgreSQL works in other
situations. You can't negotiate away the requirement to handle errors
in a way that works with PostgreSQL's infrastructure, or the
requirement that any length operation handle interrupts properly, by
appealing to a specification.

Concurrent transactions are serialized at the resolver. I heard that the current patch handles 2PC like this: the TM (transaction manager in Postgres core) requests prepare to the resolver, the resolver sends prepare to the remote server and wait for reply, the TM gets back control from the resolver, TM requests commit to the resolver, the resolver sends commit to the remote server and wait for reply, and TM gets back control. The resolver handles one transaction at a time.

That sounds more like a limitation of the present implementation than
a fundamental problem. We shouldn't reject the idea of having a
resolver process handle this just because the initial implementation
might be slow. If there's no fundamental problem with the idea,
parallelism and concurrency can be improved in separate patches at a
later time. It's much more important at this stage to reject ideas
that are not theoretically sound.

--
Robert Haas
EDB: http://www.enterprisedb.com

#267tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Robert Haas (#266)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Robert Haas <robertmhaas@gmail.com>

Well, we're talking about running this commit routine from within
CommitTransaction(), right? So I think it is in fact running in the
server. And if that's so, then you have to worry about how to make it
respond to interrupts. You can't just call some functions
DBMSX_xa_commit() and wait for infinite time for it to return. Look at
pgfdw_get_result() for an example of what real code to do this looks
like.

Postgres can do that, but other implementations can not necessaily do it, I'm afraid. But before that, the FDW interface documentation doesn't describe anything about how to handle interrupts. Actually, odbc_fdw and possibly other FDWs don't respond to interrupts.

Honestly, I am not quite sure what any specification has to say about
this. We're talking about what happens when a user does something with
a foreign table and then type COMMIT. That's all about providing a set
of behaviors that are consistent with how PostgreSQL works in other
situations. You can't negotiate away the requirement to handle errors
in a way that works with PostgreSQL's infrastructure, or the
requirement that any length operation handle interrupts properly, by
appealing to a specification.

What we're talking here is mainly whether commit should return success or failure when some participants failed to commit in the second phase of 2PC. That's new to Postgres, isn't it? Anyway, we should respect existing relevant specifications and (well-known) implementations before we conclude that we have to devise our own behavior.

That sounds more like a limitation of the present implementation than
a fundamental problem. We shouldn't reject the idea of having a
resolver process handle this just because the initial implementation
might be slow. If there's no fundamental problem with the idea,
parallelism and concurrency can be improved in separate patches at a
later time. It's much more important at this stage to reject ideas
that are not theoretically sound.

We talked about that, and unfortunately, I haven't seen a good and feasible idea to enhance the current approach that involves the resolver from the beginning of 2PC processing. Honestly, I don't understand why such a "one prepare, one commit in turn" serialization approach can be allowed in PostgreSQL where developers pursue best performance and even tries to refrain from adding an if statement in a hot path. As I showed and Ikeda-san said, other implementations have each client session send prepare and commit requests. That's a natural way to achieve reasonable concurrency and performance.

Regards
Takayuki Tsunakawa

#268Robert Haas
robertmhaas@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#267)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jun 15, 2021 at 5:51 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Postgres can do that, but other implementations can not necessaily do it, I'm afraid. But before that, the FDW interface documentation doesn't describe anything about how to handle interrupts. Actually, odbc_fdw and possibly other FDWs don't respond to interrupts.

Well, I'd consider that a bug.

What we're talking here is mainly whether commit should return success or failure when some participants failed to commit in the second phase of 2PC. That's new to Postgres, isn't it? Anyway, we should respect existing relevant specifications and (well-known) implementations before we conclude that we have to devise our own behavior.

Sure ... but we can only decide to do things that the implementation
can support, and running code that might fail after we've committed
locally isn't one of them.

We talked about that, and unfortunately, I haven't seen a good and feasible idea to enhance the current approach that involves the resolver from the beginning of 2PC processing. Honestly, I don't understand why such a "one prepare, one commit in turn" serialization approach can be allowed in PostgreSQL where developers pursue best performance and even tries to refrain from adding an if statement in a hot path. As I showed and Ikeda-san said, other implementations have each client session send prepare and commit requests. That's a natural way to achieve reasonable concurrency and performance.

I think your comparison here is quite unfair. We work hard to add
overhead in hot paths where it might cost, but the FDW case involves a
network round-trip anyway, so the cost of an if-statement would surely
be insignificant. I feel like you want to assume without any evidence
that a local resolver can never be quick enough, even thought the cost
of IPC between local processes shouldn't be that high compared to a
network round trip. But you also want to suppose that we can run code
that might fail late in the commit process even though there is lots
of evidence that this will cause problems, starting with the code
comments that clearly say so.

--
Robert Haas
EDB: http://www.enterprisedb.com

#269tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Robert Haas (#268)
RE: Transactions involving multiple postgres foreign servers, take 2

From: Robert Haas <robertmhaas@gmail.com>

On Tue, Jun 15, 2021 at 5:51 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

Postgres can do that, but other implementations can not necessaily do it, I'm

afraid. But before that, the FDW interface documentation doesn't describe
anything about how to handle interrupts. Actually, odbc_fdw and possibly
other FDWs don't respond to interrupts.

Well, I'd consider that a bug.

I kind of hesitate to call it a bug... Unlike libpq, JDBC (for jdbc_fdw) doesn't have asynchronous interface, and Oracle and PostgreSQL ODBC drivers don't support asynchronous interface. Even with libpq, COMMIT (and other SQL commands) is not always cancellable, e.g., when the (NFS) storage server gets hand while writing WAL.

What we're talking here is mainly whether commit should return success or

failure when some participants failed to commit in the second phase of 2PC.
That's new to Postgres, isn't it? Anyway, we should respect existing relevant
specifications and (well-known) implementations before we conclude that we
have to devise our own behavior.

Sure ... but we can only decide to do things that the implementation
can support, and running code that might fail after we've committed
locally isn't one of them.

Yes, I understand that Postgres may not be able to conform to specifications or well-known implementations in all aspects. I'm just suggesting to take the stance "We carefully considered established industry specifications that we can base on, did our best to design the desirable behavior learned from them, but couldn't implement a few parts", rather than "We did what we like and can do."

I think your comparison here is quite unfair. We work hard to add
overhead in hot paths where it might cost, but the FDW case involves a
network round-trip anyway, so the cost of an if-statement would surely
be insignificant. I feel like you want to assume without any evidence
that a local resolver can never be quick enough, even thought the cost
of IPC between local processes shouldn't be that high compared to a
network round trip. But you also want to suppose that we can run code
that might fail late in the commit process even though there is lots
of evidence that this will cause problems, starting with the code
comments that clearly say so.

There may be better examples. What I wanted to say is just that I believe it's not PG developers' standard to allow serial prepare and commit. Let's make it clear what's difficult to do the 2PC from each client session in normal operation without going through the resolver.

Regards
Takayuki Tsunakawa

#270k.jamison@fujitsu.com
k.jamison@fujitsu.com
In reply to: Masahiko Sawada (#235)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi Sawada-san,

I also tried to play a bit with the latest patches similar to Ikeda-san,
and with foreign 2PC parameter enabled/required.

b. about performance bottleneck (just share my simple benchmark
results)

The resolver process can be performance bottleneck easily although
I think some users want this feature even if the performance is not so

good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an
entry in each partitions.
* local connection only. If NW latency became higher, the
performance became worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only
10% performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign
-> transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough
because 8clients x 2partitions = 16, though... But, it's difficult how
to know the stable values.

During resolving one distributed transaction, the resolver needs both one
round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be resolved,
the resolver process can be easily bottle-neck given there are 8 clients.

If foreign transaction resolution was resolved synchronously, 16 would
suffice.

I tested the V36 patches on my 16-core machines.
I setup two foreign servers (F1, F2) .
F1 has addressbook table.
F2 has pgbench tables (scale factor = 1).
There is also 1 coordinator (coor) server where I created user mapping to access the foreign servers.
I executed the benchmark measurement on coordinator.
My custom scripts are setup in a way that queries from coordinator
would have to access the two foreign servers.

Coordinator:
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 1
foreign_twophase_commit = required

Other external servers 1 & 2 (F1 & F2):
max_prepared_transactions = 100

[select.sql]
\set int random(1, 100000)
BEGIN;
SELECT ad.name, ad.age, ac.abalance
FROM addressbook ad, pgbench_accounts ac
WHERE ad.id = :int AND ad.id = ac.aid;
COMMIT;

I then executed:
pgbench -r -c 2 -j 2 -T 60 -f select.sql coor

While there were no problems with 1-2 clients, I started having problems
when running the benchmark with more than 3 clients.

pgbench -r -c 4 -j 4 -T 60 -f select.sql coor

I got the following error on coordinator:

[95396]: STATEMENT: COMMIT; WARNING: there is no transaction in progress pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422
[95396]: STATEMENT: COMMIT; WARNING: there is no transaction in progress pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422
WARNING: there is no transaction in progress
pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422

Here's the log on foreign server 2 <F2> matching the above error:
<F2> LOG: statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422'
<F2> ERROR: maximum number of prepared transactions reached
<F2> HINT: Increase max_prepared_transactions (currently 100).
<F2> STATEMENT: PREPARE TRANSACTION 'fx_151455979_1216200_16422'

So I increased the max_prepared_transactions of <F1> and <F2> from 100 to 200.
Then I got the error:

[146926]: HINT: Increase max_prepared_foreign_transactions: "200".
[146926]: HINT: Increase max_prepared_foreign_transactions: "200".

So I increased the max_prepared_foreign_transactions to "300",
and got the same error of need to increase the max_prepared_transactions of foreign servers.

I just can't find the right tuning values for this.
It seems that we always run out of memory in FdwXactState insert_fdwxact
with multiple concurrent connections during PREPARE TRANSACTION.
This one I only encountered for SELECT benchmark.
Although I've got no problems with multiple connections for my custom scripts for
UPDATE and INSERT benchmarks when I tested up to 30 clients.

Would the following possibly solve this bottleneck problem?

To speed up the foreign transaction resolution, some ideas have been
discussed. As another idea, how about launching resolvers for each
foreign server? That way, we resolve foreign transactions on each
foreign server in parallel. If foreign transactions are concentrated
on the particular server, we can have multiple resolvers for the one
foreign server. It doesn’t change the fact that all foreign
transaction resolutions are processed by resolver processes.

Awesome! There seems to be another pros that even if a foreign server
is temporarily busy or stopped due to fail over, other foreign
server's transactions can be resolved.

Yes. We also might need to be careful about the order of foreign transaction
resolution. I think we need to resolve foreign transactions in arrival order at
least within a foreign server.

Regards,
Kirk Jamison

#271Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#264)
Re: Transactions involving multiple postgres foreign servers, take 2

On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.

Thanks for updating the patches!

I'm still reading these patches, but I'd like to share some review comments
that I found so far.

Thank you for the comments!

(1)
+/* Remove the foreign transaction from FdwXactParticipants */
+void
+FdwXactUnregisterXact(UserMapping *usermapping)
+{
+       Assert(IsTransactionState());
+       RemoveFdwXactEntry(usermapping->umid);
+}

Currently there is no user of FdwXactUnregisterXact().
This function should be removed?

I think that this function can be used by other FDW implementations
to unregister foreign transaction entry, although there is no use case
in postgres_fdw. This function corresponds to xa_unreg in the XA
specification.

(2)
When I ran the regression test, I got the following failure.

========= Contents of ./src/test/modules/test_fdwxact/regression.diffs
diff -U3 /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
--- /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out     2021-06-10 02:19:43.808622747 +0000
+++ /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out      2021-06-10 02:29:53.452410462 +0000
@@ -174,7 +174,7 @@
SELECT count(*) FROM pg_foreign_xacts;
count
-------
-     1
+     4
(1 row)

WIll fix.

(3)
+                                errmsg("could not read foreign transaction state from xlog at %X/%X",
+                                               (uint32) (lsn >> 32),
+                                               (uint32) lsn)));

LSN_FORMAT_ARGS() should be used?

Agreed.

(4)
+extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
+                                                               int len);

Since RecreateFdwXactFile() is used only in fdwxact.c,
the above "extern" is not necessary?

Right.

(5)
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.

So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
for WAL record to be replicated to the standby if sync replication is enabled?
Otherwise, when the failover happens, new primary (past-standby)
might not have enough XLOG_FDWXACT_INSERT WAL records and
might fail to find some in-doubt foreign transactions.

But even if we wait for the record to be replicated, this problem
isn't completely resolved, right? If the server crashes before the
standy receives the record and the failover happens then the new
master doesn't have the record. I wonder if we need to have another
FDW API in order to get the list of prepared transactions from the
foreign server (FDW). For example in postgres_fdw case, it gets the
list of prepared transactions on the foreign server by executing a
query. It seems to me that this corresponds to xa_recover in the XA
specification.

(6)
XLogFlush() is called for each foreign transaction. So if there are many
foreign transactions, XLogFlush() is called too frequently. Which might
cause unnecessary performance overhead? Instead, for example,
we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions()
after inserting all WAL records for all foreign transactions?

Agreed.

(7)
/* Open connection; report that we'll create a prepared statement. */
fmstate->conn = GetConnection(user, true, &fmstate->conn_state);
+ MarkConnectionModified(user);

MarkConnectionModified() should be called also when TRUNCATE on
a foreign table is executed?

Good catch. Will fix.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#272Masahiko Sawada
sawada.mshk@gmail.com
In reply to: k.jamison@fujitsu.com (#270)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jun 24, 2021 at 9:46 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:

Hi Sawada-san,

I also tried to play a bit with the latest patches similar to Ikeda-san,
and with foreign 2PC parameter enabled/required.

Thank you for testing the patch!

b. about performance bottleneck (just share my simple benchmark
results)

The resolver process can be performance bottleneck easily although
I think some users want this feature even if the performance is not so

good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an
entry in each partitions.
* local connection only. If NW latency became higher, the
performance became worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only
10% performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign
-> transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough
because 8clients x 2partitions = 16, though... But, it's difficult how
to know the stable values.

During resolving one distributed transaction, the resolver needs both one
round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be resolved,
the resolver process can be easily bottle-neck given there are 8 clients.

If foreign transaction resolution was resolved synchronously, 16 would
suffice.

I tested the V36 patches on my 16-core machines.
I setup two foreign servers (F1, F2) .
F1 has addressbook table.
F2 has pgbench tables (scale factor = 1).
There is also 1 coordinator (coor) server where I created user mapping to access the foreign servers.
I executed the benchmark measurement on coordinator.
My custom scripts are setup in a way that queries from coordinator
would have to access the two foreign servers.

Coordinator:
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 1
foreign_twophase_commit = required

Other external servers 1 & 2 (F1 & F2):
max_prepared_transactions = 100

[select.sql]
\set int random(1, 100000)
BEGIN;
SELECT ad.name, ad.age, ac.abalance
FROM addressbook ad, pgbench_accounts ac
WHERE ad.id = :int AND ad.id = ac.aid;
COMMIT;

I then executed:
pgbench -r -c 2 -j 2 -T 60 -f select.sql coor

While there were no problems with 1-2 clients, I started having problems
when running the benchmark with more than 3 clients.

pgbench -r -c 4 -j 4 -T 60 -f select.sql coor

I got the following error on coordinator:

[95396] ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422
[95396] STATEMENT: COMMIT;
WARNING: there is no transaction in progress
pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422

Here's the log on foreign server 2 <F2> matching the above error:
<F2> LOG: statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422'
<F2> ERROR: maximum number of prepared transactions reached
<F2> HINT: Increase max_prepared_transactions (currently 100).
<F2> STATEMENT: PREPARE TRANSACTION 'fx_151455979_1216200_16422'

So I increased the max_prepared_transactions of <F1> and <F2> from 100 to 200.
Then I got the error:

[146926] ERROR: maximum number of foreign transactions reached
[146926] HINT: Increase max_prepared_foreign_transactions: "200".

So I increased the max_prepared_foreign_transactions to "300",
and got the same error of need to increase the max_prepared_transactions of foreign servers.

I just can't find the right tuning values for this.
It seems that we always run out of memory in FdwXactState insert_fdwxact
with multiple concurrent connections during PREPARE TRANSACTION.
This one I only encountered for SELECT benchmark.
Although I've got no problems with multiple connections for my custom scripts for
UPDATE and INSERT benchmarks when I tested up to 30 clients.

Would the following possibly solve this bottleneck problem?

With the following idea, the performance will get better but will not
be completely solved. Because those results shared by you and
Ikeda-san come from the fact that with the patch we asynchronously
commit the foreign prepared transaction (i.g., asynchronously
performing the second phase of 2PC), but not the architecture. As I
mentioned before, I intentionally removed the synchronous committing
foreign prepared transaction part from the patch set since we still
need to have a discussion of that part. Therefore, with this version
patch, the backend returns OK to the client right after the local
transaction commits with neither committing foreign prepared
transactions by itself nor waiting for those to be committed by the
resolver process. As long as the backend doesn’t wait for foreign
prepared transactions to be committed and there is a limit of the
number of foreign prepared transactions to be held, it could reach the
upper bound if committing foreign prepared transactions cannot keep
up.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#273Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#271)
Re: Transactions involving multiple postgres foreign servers, take 2

On Thu, Jun 24, 2021 at 10:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

(5)
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.

So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
for WAL record to be replicated to the standby if sync replication is enabled?
Otherwise, when the failover happens, new primary (past-standby)
might not have enough XLOG_FDWXACT_INSERT WAL records and
might fail to find some in-doubt foreign transactions.

But even if we wait for the record to be replicated, this problem
isn't completely resolved, right?

Ah, I misunderstood the order of writing WAL records and preparing
foreign transactions. You're right. Combining your suggestion below,
perhaps we need to write all WAL records, call XLogFlush(), wait for
those records to be replicated, and prepare all foreign transactions.
Even in cases where the server crashes during preparing a foreign
transaction and the failover happens, the new master has all foreign
transaction entries. Some of them might not actually be prepared on
the foreign servers but it should not be a problem.

(6)
XLogFlush() is called for each foreign transaction. So if there are many
foreign transactions, XLogFlush() is called too frequently. Which might
cause unnecessary performance overhead? Instead, for example,
we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions()
after inserting all WAL records for all foreign transactions?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#274Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#272)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/06/24 22:27, Masahiko Sawada wrote:

On Thu, Jun 24, 2021 at 9:46 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:

Hi Sawada-san,

I also tried to play a bit with the latest patches similar to Ikeda-san,
and with foreign 2PC parameter enabled/required.

Thank you for testing the patch!

b. about performance bottleneck (just share my simple benchmark
results)

The resolver process can be performance bottleneck easily although
I think some users want this feature even if the performance is not so

good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an
entry in each partitions.
* local connection only. If NW latency became higher, the
performance became worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only
10% performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign
-> transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough
because 8clients x 2partitions = 16, though... But, it's difficult how
to know the stable values.

During resolving one distributed transaction, the resolver needs both one
round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be resolved,
the resolver process can be easily bottle-neck given there are 8 clients.

If foreign transaction resolution was resolved synchronously, 16 would
suffice.

I tested the V36 patches on my 16-core machines.
I setup two foreign servers (F1, F2) .
F1 has addressbook table.
F2 has pgbench tables (scale factor = 1).
There is also 1 coordinator (coor) server where I created user mapping to access the foreign servers.
I executed the benchmark measurement on coordinator.
My custom scripts are setup in a way that queries from coordinator
would have to access the two foreign servers.

Coordinator:
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 1
foreign_twophase_commit = required

Other external servers 1 & 2 (F1 & F2):
max_prepared_transactions = 100

[select.sql]
\set int random(1, 100000)
BEGIN;
SELECT ad.name, ad.age, ac.abalance
FROM addressbook ad, pgbench_accounts ac
WHERE ad.id = :int AND ad.id = ac.aid;
COMMIT;

I then executed:
pgbench -r -c 2 -j 2 -T 60 -f select.sql coor

While there were no problems with 1-2 clients, I started having problems
when running the benchmark with more than 3 clients.

pgbench -r -c 4 -j 4 -T 60 -f select.sql coor

I got the following error on coordinator:

[95396] ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422
[95396] STATEMENT: COMMIT;
WARNING: there is no transaction in progress
pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422

Here's the log on foreign server 2 <F2> matching the above error:
<F2> LOG: statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422'
<F2> ERROR: maximum number of prepared transactions reached
<F2> HINT: Increase max_prepared_transactions (currently 100).
<F2> STATEMENT: PREPARE TRANSACTION 'fx_151455979_1216200_16422'

So I increased the max_prepared_transactions of <F1> and <F2> from 100 to 200.
Then I got the error:

[146926] ERROR: maximum number of foreign transactions reached
[146926] HINT: Increase max_prepared_foreign_transactions: "200".

So I increased the max_prepared_foreign_transactions to "300",
and got the same error of need to increase the max_prepared_transactions of foreign servers.

I just can't find the right tuning values for this.
It seems that we always run out of memory in FdwXactState insert_fdwxact
with multiple concurrent connections during PREPARE TRANSACTION.
This one I only encountered for SELECT benchmark.
Although I've got no problems with multiple connections for my custom scripts for
UPDATE and INSERT benchmarks when I tested up to 30 clients.

Would the following possibly solve this bottleneck problem?

With the following idea, the performance will get better but will not
be completely solved. Because those results shared by you and
Ikeda-san come from the fact that with the patch we asynchronously
commit the foreign prepared transaction (i.g., asynchronously
performing the second phase of 2PC), but not the architecture. As I
mentioned before, I intentionally removed the synchronous committing
foreign prepared transaction part from the patch set since we still
need to have a discussion of that part. Therefore, with this version
patch, the backend returns OK to the client right after the local
transaction commits with neither committing foreign prepared
transactions by itself nor waiting for those to be committed by the
resolver process. As long as the backend doesn’t wait for foreign
prepared transactions to be committed and there is a limit of the
number of foreign prepared transactions to be held, it could reach the
upper bound if committing foreign prepared transactions cannot keep
up.

Hi Jamison-san, sawada-san,

Thanks for testing!

FWIF, I tested using pgbench with "--rate=" option to know the server
can execute transactions with stable throughput. As sawada-san said,
the latest patch resolved second phase of 2PC asynchronously. So,
it's difficult to control the stable throughput without "--rate=" option.

I also worried what I should do when the error happened because to increase
"max_prepared_foreign_transaction" doesn't work. Since too overloading may
show the error, is it better to add the case to the HINT message?

BTW, if sawada-san already develop to run the resolver processes in parallel,
why don't you measure performance improvement? Although Robert-san,
Tunakawa-san and so on are discussing what architecture is best, one
discussion point is that there is a performance risk if adopting asynchronous
approach. If we have promising solutions, I think we can make the discussion
forward.

In my understanding, there are three improvement idea. First is that to make
the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
PREPARED" remote servers in bulk. Third is to stop syncing the WAL
remove_fdwxact() after resolving is done, which I addressed in the mail sent
at June 3rd, 13:56. Since third idea is not yet discussed, there may
be my misunderstanding.

--
Masahiro Ikeda
NTT DATA CORPORATION

#275Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#271)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/06/24 22:11, Masahiko Sawada wrote:

On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/05/11 13:37, Masahiko Sawada wrote:
So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
for WAL record to be replicated to the standby if sync replication is enabled?
Otherwise, when the failover happens, new primary (past-standby)
might not have enough XLOG_FDWXACT_INSERT WAL records and
might fail to find some in-doubt foreign transactions.

But even if we wait for the record to be replicated, this problem
isn't completely resolved, right? If the server crashes before the
standy receives the record and the failover happens then the new
master doesn't have the record. I wonder if we need to have another
FDW API in order to get the list of prepared transactions from the
foreign server (FDW). For example in postgres_fdw case, it gets the
list of prepared transactions on the foreign server by executing a
query. It seems to me that this corresponds to xa_recover in the XA
specification.

FWIF, Citus implemented as sawada-san said above [1]SIGMOD 2021 525 Citus: Distributed PostgreSQL for Data Intensive Applications From 12:27 says that how to solve unresolved prepared xacts. https://www.youtube.com/watch?v=AlF4C60FdlQ&amp;list=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq.

Since each WAL record for PREPARE is flushing in the latest patch, the latency
became too much, especially under synchronous replication. For example, the
transaction involving three foreign servers must wait to sync "three" WAL
records for PREPARE and "one" WAL records for local commit in remote server
one by one sequentially. So, I think that Sawada-san's idea is good to improve
the latency although fdw developer's work increases.

[1]: SIGMOD 2021 525 Citus: Distributed PostgreSQL for Data Intensive Applications From 12:27 says that how to solve unresolved prepared xacts. https://www.youtube.com/watch?v=AlF4C60FdlQ&amp;list=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq
SIGMOD 2021 525 Citus: Distributed PostgreSQL for Data Intensive Applications
From 12:27 says that how to solve unresolved prepared xacts.
https://www.youtube.com/watch?v=AlF4C60FdlQ&amp;list=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#276Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiro Ikeda (#274)
9 attachment(s)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

Hi Jamison-san, sawada-san,

Thanks for testing!

FWIF, I tested using pgbench with "--rate=" option to know the server
can execute transactions with stable throughput. As sawada-san said,
the latest patch resolved second phase of 2PC asynchronously. So,
it's difficult to control the stable throughput without "--rate=" option.

I also worried what I should do when the error happened because to increase
"max_prepared_foreign_transaction" doesn't work. Since too overloading may
show the error, is it better to add the case to the HINT message?

BTW, if sawada-san already develop to run the resolver processes in parallel,
why don't you measure performance improvement? Although Robert-san,
Tunakawa-san and so on are discussing what architecture is best, one
discussion point is that there is a performance risk if adopting asynchronous
approach. If we have promising solutions, I think we can make the discussion
forward.

Yeah, if we can asynchronously resolve the distributed transactions
without worrying about max_prepared_foreign_transaction error, it
would be good. But we will need synchronous resolution at some point.
I think we at least need to discuss it at this point.

I've attached the new version patch that incorporates the comments
from Fujii-san and Ikeda-san I got so far. We launch a resolver
process per foreign server, committing prepared foreign transactions
on foreign servers in parallel. To get a better performance based on
the current architecture, we can have multiple resolver processes per
foreign server but it seems not easy to tune it in practice. Perhaps
is it better if we simply have a pool of resolver processes and we
assign a resolver process to the resolution of one distributed
transaction one by one? That way, we need to launch resolver processes
as many as the concurrent backends using 2PC.

In my understanding, there are three improvement idea. First is that to make
the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
PREPARED" remote servers in bulk. Third is to stop syncing the WAL
remove_fdwxact() after resolving is done, which I addressed in the mail sent
at June 3rd, 13:56. Since third idea is not yet discussed, there may
be my misunderstanding.

Yes, those optimizations are promising. On the other hand, they could
introduce complexity to the code and APIs. I'd like to keep the first
version simple. I think we need to discuss them at this stage but can
leave the implementation of both parallel execution and batch
execution as future improvements.

For the third idea, I think the implementation was wrong; it removes
the state file then flushes the WAL record. I think these should be
performed in the reverse order. Otherwise, FdwXactState entry could be
left on the standby if the server crashes between them. I might be
missing something though.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v37-0008-Documentation-update.patchapplication/octet-stream; name=v37-0008-Documentation-update.patchDownload
From 737e5d0abff68e411cae6ca3ba9a856d1b5e7ff5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 May 2020 15:06:38 +0900
Subject: [PATCH v37 8/9] Documentation update.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 doc/src/sgml/catalogs.sgml                | 135 ++++++++++++
 doc/src/sgml/config.sgml                  | 144 +++++++++++++
 doc/src/sgml/distributed-transaction.sgml | 158 ++++++++++++++
 doc/src/sgml/fdwhandler.sgml              | 245 ++++++++++++++++++++++
 doc/src/sgml/filelist.sgml                |   1 +
 doc/src/sgml/func.sgml                    | 147 +++++++++++++
 doc/src/sgml/monitoring.sgml              |  42 ++++
 doc/src/sgml/postgres.sgml                |   1 +
 doc/src/sgml/storage.sgml                 |   6 +
 src/backend/access/transam/README.fdwxact | 142 +++++++++++++
 10 files changed, 1021 insertions(+)
 create mode 100644 doc/src/sgml/distributed-transaction.sgml
 create mode 100644 src/backend/access/transam/README.fdwxact

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f517a7d4af..b8d6b54e7b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9388,6 +9388,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <entry>summary of configuration file contents</entry>
      </row>
 
+     <row>
+      <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry>
+      <entry>foreign transactions</entry>
+     </row>
+
      <row>
       <entry><link linkend="view-pg-group"><structname>pg_group</structname></link></entry>
       <entry>groups of database users</entry>
@@ -11262,6 +11267,136 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  </sect1>
 
+ <sect1 id="view-pg-foreign-xacts">
+  <title><structname>pg_foreign_xacts</structname></title>
+
+  <indexterm zone="view-pg-foreign-xacts">
+   <primary>pg_foreign_xacts</primary>
+  </indexterm>
+
+  <para>
+   The view <structname>pg_foreign_xacts</structname> displays
+   information about foreign transactions that are opened on
+   foreign servers for atomic distributed transaction commit (see
+   <xref linkend="atomic-commit"/> for details).
+  </para>
+
+  <para>
+   <structname>pg_foreign_xacts</structname> contains one row per foreign
+   transaction.  An entry is removed when the foreign transaction is
+   committed or rolled back.
+  </para>
+
+  <table>
+   <title><structname>pg_foreign_xacts</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><structfield>dbid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+      <entry>
+       OID of the database which the foreign transaction resides in
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>xid</structfield></entry>
+      <entry><type>xid</type></entry>
+      <entry></entry>
+      <entry>
+       Numeric transaction identifier with which this foreign transaction
+       associates
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>serverid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the foreign server on which the foreign transaction is prepared
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>userid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry>
+      <entry>
+       The OID of the user that prepared this foreign transaction.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>status</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       Status of foreign transaction. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>preparing</literal> : This foreign transaction is being prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>prepared</literal> : This foreign transaction has been prepared.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>comitting</literal> : This foreign transcation has been
+          prepared to commit or being committed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>abortin</literal> : This foreign transaction has been
+          prepared to abort or being aborted.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>locker_pid</structfield></entry>
+      <entry><type>int</type></entry>
+      <entry></entry>
+      <entry>
+       Process ID of the locker currently processing.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>identifier</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>
+       The identifier of the prepared foreign transaction.
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   When the <structname>pg_foreign_xacts</structname> view is accessed, the
+   internal transaction manager data structures are momentarily locked, and
+   a copy is made for the view to display.  This ensures that the
+   view produces a consistent set of results, while not blocking
+   normal operations longer than necessary.  Nonetheless
+   there could be some impact on database performance if this view is
+   frequently accessed.
+  </para>
+
+ </sect1>
+
  <sect1 id="view-pg-publication-tables">
   <title><structname>pg_publication_tables</structname></title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6098f6b020..951fbef695 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9580,6 +9580,150 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
    </sect1>
 
+   <sect1 id="runtime-config-distributed-transaction">
+    <title>Distributed Transaction Management</title>
+
+    <sect2 id="runtime-config-distributed-transaction-settings">
+     <title>Setting</title>
+     <variablelist>
+
+      <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit">
+       <term><varname>foreign_twophase_commit</varname> (<type>enum</type>)
+        <indexterm>
+         <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies whether distributed transaction commits ensures that all
+         involved changes on foreign servers are committed or not. Valid
+         values are <literal>required</literal> and <literal>disabled</literal>.
+         The default setting is <literal>disabled</literal>. Setting to
+         <literal>disabled</literal> don't use two-phase commit protocol to
+         commit or rollback distributed transactions. When set to
+         <literal>required</literal> distributed transactions strictly requires
+         that all written servers can use two-phase commit protocol.  That is,
+         the distributed transaction cannot commit if even one server does not
+         support the prepare callback routine
+         (described in <xref linkend="fdw-callbacks-transaction-management"/>).
+         In <literal>required</literal> case, distributed transaction commit will
+         wait for all involving foreign transaction to be committed before the
+         command return a "success" indication to the client.
+        </para>
+
+        <para>
+         This parameter can be changed at any time; the behavior for any one
+         transaction is determined by the setting in effect when it commits.
+        </para>
+
+        <note>
+         <para>
+          When <literal>disabled</literal> there can be risk of database
+          consistency if one or more foreign servers crashes while committing
+          the distributed transactions.
+         </para>
+        </note>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions">
+       <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of foreign transactions that can be prepared
+         simultaneously. A single local transaction can give rise to multiple
+         foreign transaction. If a user expects <literal>N</literal> local
+         transactions and each of those involves <literal>K</literal> foreign
+         servers, this value need to be set <literal>N * K</literal>, not
+         just <literal>N</literal>.  This parameter can only be set at server
+         start.
+        </para>
+        <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
+       </listitem>
+      </varlistentry>
+
+     </variablelist>
+    </sect2>
+
+    <sect2 id="runtime-config-foreign-transaction-resolver">
+     <title>Foreign Transaction Resolvers</title>
+
+     <para>
+      These settings control the behavior of a foreign transaction resolver.
+     </para>
+
+     <variablelist>
+      <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers">
+       <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>)
+        <indexterm>
+         <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specifies maximum number of foreign transaction resolution workers. A foreign transaction
+         resolver is responsible for foreign transaction resolution on one database.
+        </para>
+        <para>
+         Foreign transaction resolution workers are taken from the pool defined by
+         <varname>max_worker_processes</varname>.
+        </para>
+        <para>
+         The default value is 0.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval">
+       <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Specify how long the foreign transaction resolver should wait when the last resolution
+         fails before retrying to resolve foreign transaction. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command line.
+        </para>
+        <para>
+         The default value is 10 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout">
+       <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>)
+        <indexterm>
+         <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary>
+        </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Terminate foreign transaction resolver processes that don't have any foreign
+         transactions to resolve longer than the specified number of milliseconds.
+         A value of zero disables the timeout mechanism, meaning it connects to one
+         database until stopping manually by <function>pg_stop_foreign_xact_resovler()</function>.
+         This parameter can only be set in the <filename>postgresql.conf</filename>
+         file or on the server command line.
+        </para>
+        <para>
+         The default value is 60 seconds.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+   </sect1>
+
    <sect1 id="runtime-config-compatible">
     <title>Version and Platform Compatibility</title>
 
diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml
new file mode 100644
index 0000000000..1106fe00c9
--- /dev/null
+++ b/doc/src/sgml/distributed-transaction.sgml
@@ -0,0 +1,158 @@
+<!-- doc/src/sgml/distributed-transaction.sgml -->
+
+<chapter id="distributed-transaction">
+ <title>Distributed Transaction</title>
+
+ <para>
+  A distributed transaction is a transaction in which two or more network hosts
+  are involved. <productname>PostgreSQL</productname>'s global transaction
+  manager supports distributed transactions that access foreign servers using
+  Foreign Data Wrappers.
+ </para>
+
+ <sect1 id="atomic-commit">
+  <title>Atomic Commit</title>
+
+  <para>
+   Formerly, transactions on foreign servers were simply committed or rolled
+   back one by one. Therefore, when one foreign server had a problem during
+   commit, it was possible that transactions on only part of foreign servers
+   are committed while other transactions are rolled back. This used to leave
+   database data in an inconsistent state in terms of a federated database.
+   Atomic commit of distributed transactions is an operation that applies a set
+   of changes as a single operation globally. This guarantees all-or-nothing
+   results for the changes on all remote hosts involved in.
+   <productname>PostgreSQL</productname> provides a way to perform read-write
+   transactions with foreign resources using foreign data wrappers.
+   Using <productname>PostgreSQL</productname>'s atomic commit ensures that
+   all changes on foreign servers are either committed or rolled back using the
+   transaction callback routines
+   (see <xref linkend="fdw-callbacks-transaction-management"/>).
+  </para>
+
+  <sect2>
+   <title>Atomic Commit Using Two-phase Commit Protocol</title>
+
+   <para>
+    To achieve commit among all foreign servers automatically,
+    <productname>PostgreSQL</productname> employs two-phase commit protocol,
+    which is a type of atomic commitment protocol (ACP).  Using two-phase
+    commit protocol, the commit sequence of distributed transaction performs
+    with the following steps:
+    <orderedlist>
+     <listitem>
+      <para>
+       Prepare all transactions on foreign servers.
+       <productname>PostgreSQL</productname>'s distributed transaction manager
+       prepares all transactions on the foreign servers if two-phase commit is
+       required. Two-phase commit is required when the transaction modifies
+       data on two or more servers including the local server itself and
+       <xref linkend="guc-foreign-twophase-commit"/> is
+       <literal>required</literal>. If the preparation on all foreign servers
+       is successful then go to the next step.  If there is any failure in the
+       prepare phase, the server will rollback all the transactions on both
+       local and foreign servers.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Commit the local transaction. The server commits the transaction locally.
+       Once the local transaction gets committed, we will never rollback any
+       involved transactions.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Resolve all prepared transactions on foreign servers. Prepared transactions
+       are committed or rolled back according to the result of the local transaction.
+       This step is performed by a foreign transaction resolver process.
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    The above sequence is executed transparently to the users at transaction commit.
+    The transaction returns an acknowledgment of the successful commit of the
+    distributed transaction to the client after step 2.  After that, all
+    prepared transactions are resolved asynchronously by a foreign transaction
+    resolver process.
+   </para>
+
+   <para>
+    When the user executes <command>PREPARE TRANSACTION</command>, the transaction
+    prepares the local transactions as well as all involved transactions on the
+    foreign servers. Likewise, when <command>COMMIT PREPARED</command> or
+    <command>ROLLBACK PREPARED</command> all prepared transactions are resolved
+    asynchronously after committing or rolling back the local transaction.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-in-doubt-transaction">
+   <title>In-Doubt Transactions</title>
+
+   <para>
+    Distributed transaction can become <firstterm>in-doubt</firstterm> state
+    after preparing the all involved transactions until the all involved
+    transaction are resolved.  In case where the local node crashes during
+    preparing transactions, the distributed transaction becomes in-doubt
+    state.  The information of involved foreign transactions is recovered
+    during crash recovery and these are resolved in background.  Until all
+    in-doubt state transactions are resolved, other transactions might see
+    an inconsistent results on the foreign servers on reading.
+   </para>
+
+   <para>
+    The foreign transaction resolver processes automatically resolve the
+    transactions associated with the in-doubt distributed transaction. Or you
+    can use <function>pg_resolve_foriegn_xact</function> function to resolve
+    it manually.
+   </para>
+  </sect2>
+
+  <sect2 id="atomic-commit-transaction-resolver">
+   <title>Foreign Transaction Resolver Processes</title>
+
+   <para>
+    Foreign transaction resolver processes are auxiliary processes that are
+    responsible for resolving in-doubt distributed transactions. They commit or
+    rollback prepared transactions on all foreign servers involved with the
+    distributed transaction according to the result of the corresponding local
+    transaction.
+   </para>
+
+   <para>
+    One foreign transaction resolver is responsible for transaction resolutions
+    on the database to which it is connected. On failure during resolution, they
+    retry to resolve at an interval of
+    <varname>foreign_transaction_resolution_interval</varname> time.
+   </para>
+
+   <note>
+    <para>
+     During a foreign transaction resolver process connecting to the database,
+     the database cannot be dropped without immediate shutdown. You can call
+     <function>pg_stop_foreign_xact_resovler</function> function to stop the
+     particular resolver process before dropping the database.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2>
+   <title>Configuration Settings</title>
+
+   <para>
+    Atomic commit requires several configuration options to be set.
+    On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and
+    <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value,
+    and <xref linkend="guc-foreign-twophase-commit"/> must be enabled.  Additionally,
+    the <varname>max_worker_processes</varname> may need to be adjusted
+    to accommodate for foreign transaction resolver workers, at least
+    (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>).
+    Note that other <productname>PostgreSQL</productname> features such as parallel
+    queries, logical replication, etc., also take worker slots from
+    <varname>max_worker_processes</varname>.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index d1194def82..8cd3552bd3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1657,6 +1657,117 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
     </para>
    </sect2>
 
+   <sect2 id="fdw-callbacks-transaction-management">
+    <title>FDW Routines For Transaction Management</title>
+
+    <para>
+     Transaction management callbacks are used to commit, rollback, and
+     prepare the foreign transaction. If an FDW wishes that its foreign
+     transaction is managed by <productname>PostgreSQL</productname>'s global
+     transaction manager it must provide both
+     <function>CommitForeignTransaction</function> and
+     <function>RollbackForeignTransaction</function>. In addition, if an FDW
+     wishes to support <firstterm>atomic commit</firstterm> (as described in
+     <xref linkend="fdw-transaction-managements"/>), it must provide
+     <function>PrepareForeignTransaction</function> as well and can provide
+     <function>GetPrepareId</function> callback optionally.
+    </para>
+
+    <para>
+<programlisting>
+void
+PrepareForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Prepare the transaction on the foreign server. This function is called at the
+    pre-commit phase of the local transactions if foreign twophase commit is
+    required. This function is used only for distributed transaction management
+    (see <xref linkend="distributed-transaction"/>).
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+    <para>
+<programlisting>
+bool
+CommitForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Commit the foreign transaction. This function is called either at
+    the pre-commit phase of the local transaction if the transaction
+    can be committed in one-phase or at the post-commit phase if
+    two-phase commit is required. If <literal>finfo-&gt;flags</literal> has
+    the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction
+    can be committed in one-phase, this function must commit the prepared
+    transaction identified by <literal>finfo-&gt;identifier</literal>.
+    </para>
+
+    <para>
+     Note that all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+bool
+RollbackForeignTransaction(FdwXactInfo *finfo);
+</programlisting>
+    Rollback the foreign transaction. This function is called either at
+    the end of the local transaction after rolled back locally. The foreign
+    transactions are rolled back when a user requested rollbacking or when
+    an error occurs during the transaction. This function must be tolerant to
+    being called recursively if any error occurs during rollback of the foreign
+    transaction. So you would need to track recursion and prevent being called
+    infinitely. If <literal>finfo-&gt;flags</literal> has the flag
+    <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled
+    back in one-phase, otherwise this function must rollback the prepared
+    transaction identified by <literal>finfo-&gt;identifier</literal>.
+    </para>
+
+    <para>
+     The foreign transaction identified by <literal>finfo-&gt;identifier</literal>
+     might not exist on the foreign servers. This can happen when, for instance,
+     there is a failure during preparing the foreign transaction. Therefore, this
+     function needs to tolerate the undefined object error
+     (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error.
+    </para>
+
+    <para>
+     Note that in all cases except for calling <function>pg_resolve_fdwxact</function>
+     SQL function, this callback function is executed by foreign transaction
+     resolver processes.
+    </para>
+    <para>
+<programlisting>
+char *
+GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len);
+</programlisting>
+    Return null-terminated string that represents prepared transaction identifier
+    with its length <varname>*prep_id_len</varname>.
+    This optional function is called during executor startup for once per the
+    foreign server. Note that the transaction identifier must be a string literal,
+    less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same
+    as any other concurrent prepared transaction id. If this callback routine
+    is not supported, <productname>PostgreSQL</productname>'s  distributed
+    transaction manager generates a unique identifier in the form of
+    <literal>fx_&lt;random value up to 2<superscript>31</superscript>&gt;_&lt;xid&gt;_&lt;user mapping oid&gt;</literal>.
+    </para>
+
+    <para>
+     Note that this callback function is always executed by backend processes.
+    </para>
+
+    <note>
+     <para>
+      Functions <function>PrepareForeignTransaction</function>,
+      <function>CommitForeignTransaction</function> and
+      <function>RolblackForeignTransaction</function> are called
+      outside of a valid transaction state. So please note that
+      you cannot use functions that use the system catalog cache
+      such as Foreign Data Wrapper helper functions described in
+      <xref linkend="fdw-helpers"/>.
+     </para>
+    </note>
+   </sect2>
    </sect1>
 
    <sect1 id="fdw-helpers">
@@ -2136,4 +2247,138 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
   </sect1>
 
+  <sect1 id="fdw-transaction-managements">
+   <title>Transaction managements for Foreign Data Wrappers</title>
+   <para>
+    If a server used by an  FDW supports transactions, it is usually worthwhile
+    for the FDW to manage transactions opened on the foreign server. The FDW
+    callback function <literal>CommitForeignTransaction</literal>,
+    <literal>RollbackForeignTransaction</literal> and
+    <literal>PrepareForeignTransaction</literal> are used for transaction
+    management and must fit into the working of the
+    <productname>PostgreSQL</productname> transaction processing.
+   </para>
+
+   <para>
+    The information in <literal>FdwXactInfo</literal> can be used to get
+    information of foreign server being processed such as
+    <structname>ForeignServer</structname> and <structname>UserMapping</structname>
+    The <literal>flags</literal> has contains flag bit describing the
+    foreign transaction state for transaction management.
+   </para>
+
+   <para>
+    The foreign transaction needs to be registered to
+    <productname>PostgreSQL</productname> global transaction manager.
+    Registration and unregistration are done by calling
+    <function>FdwXactRegisterXact</function> and
+    <function>FdwXactUnregisterXact</function> respectively.
+    The FDW can pass a boolean <literal>modified</literal> along with
+    <structname>UserMapping</structname> to <function>FdwXactRegisterXact</function>
+    indicating that writes are going to happen on the foreign server.  Such
+    foreign servers are taken into account for the decision of two-phase
+    commit protocol being required or not.
+   </para>
+
+   <para>
+    The FDW callback function <function>CommitForeignTransaction</function>
+    and <function>RollbackForeignTransaction</function> are used to commit
+    and rollback foreign transactions. During transaction commit, the global
+    transaction manager calls <function>CommitForeignTransaction</function> function
+    in the pre-commit phase and calls
+    <function>RollbackForeignTransaction</function> function in the post-rollback
+    phase.
+   </para>
+
+   <para>
+    In addition to simply commit and rollback foreign transactions,
+    <productname>PostgreSQL</productname> global transaction manager enables
+    distributed transactions to atomically commit and rollback among all foreign
+    servers, which is as known as atomic commit in literature. To achieve atomic
+    commit, <productname>PostgreSQL</productname> employs two-phase commit
+    protocol, which is a type of atomic commitment protocol. Every FDWs that wish
+    to support two-phase commit protocol are required to have the FDW callback
+    function <function>PrepareForeignTransaction</function> and optionally
+    <function>GetPrepareId</function>, in addition to
+    <function>CommitForeignTransaction</function> and
+    <function>RollbackForeignTransaction</function>
+    (see <xref linkend="fdw-callbacks-transaction-management"/> for details).
+   </para>
+
+   <para>
+    An example of a distributed transaction is as follows
+<programlisting>
+BEGIN;
+UPDATE ft1 SET col = 'a';
+UPDATE ft2 SET col = 'b';
+COMMIT;
+</programlisting>
+    ft1 and ft2 are foreign tables on different foreign servers may be using different
+    Foreign Data Wrappers.
+   </para>
+
+   <para>
+    When the core executor access the foreign servers, foreign servers whose FDW
+    supports transaction management callback routines is registered as a participant.
+    During registration, <function>GetPrepareId</function> is called if provided to
+    generate a unique transaction identifier.
+   </para>
+
+   <para>
+    During pre-commit phase of the local transaction, the foreign transaction manager
+    persists the foreign transaction information to the disk and WAL, and then
+    prepare all foreign transactions by calling
+    <function>PrepareForeignTransaction</function> if two-phase commit protocol
+    is required. Two-phase commit is required when the transaction modified data
+    on more than one server including the local server itself and a user requests
+    foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>).
+   </para>
+
+   <para>
+    <productname>PostgreSQL</productname> commits locally and go to the next
+    step if and only if all foreign transactions are prepared successfully.
+    If any failure happens or a user requests to cancel during preparation,
+    the global transaction manager changes over rollback and calls
+    <function>RollbackForeignTransaction</function>.
+   </para>
+
+   <para>
+    When changing over rollback due to any failure, it calls
+    <function>RollbackForeignTransaction</function> with
+    <literal>FDWXACT_FLAG_ONEPHASE</literal> for foreign transactions which are not
+    closed yet and calls <function>RollbackForeignTransaction</function> without
+    that flag for foreign transactions which are already prepared.  For foreign
+    transactions which are being prepared, it does both because it's not sure that
+    the preparation has been completed on the foreign server. Therefore,
+    <function>RollbackForeignTransaction</function> needs to tolerate the undefined
+    object error.
+   </para>
+
+   <para>
+    Note that when <literal>(finfo-&gt;flags &amp; FDWXACT_FLAG_ONEPHASE)</literal>
+    is true, both <literal>CommitForeignTransaction</literal> function and
+    <literal>RollbackForeignTransaction</literal> function should commit and
+    rollback directly, rather than processing prepared transactions. This can
+    happen when two-phase commit is not required or a foreign server is not
+    modified within the transaction.
+   </para>
+
+   <para>
+    Once all foreign transactions are prepared, the core transaction manager commits
+    locally. After that the transaction commit waits for all prepared foreign
+    transaction to be committed before completion. After all prepared foreign
+    transactions are resolved the transaction commit completes.
+   </para>
+
+   <para>
+    One foreign transaction resolver process is responsible for foreign
+    transaction resolution on a database. The foreign transaction resolver process
+    calls either <function>CommitForeignTransaction</function> or
+    <function>RollbackForeignTransaction</function> to resolve the foreign
+    transaction identified by <literal>finfo-&gt;identifier</literal>. If failed
+    to resolve, the resolver process will exit with an error message. The foreign
+    transaction launcher will launch the resolver process again at
+    <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval.
+   </para>
+  </sect1>
  </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 596bfecf8e..74aa15e705 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY distributed-transaction    SYSTEM "distributed-transaction.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6388385edc..55bde2bdba 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27269,6 +27269,153 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_relation_check_pages</primary>
+        </indexterm>
+        <function>pg_relation_check_pages</function> ( <parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> ] )
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>path</parameter> <type>text</type>,
+        <parameter>failed_block_num</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Checks the pages of the specified relation to see if they are valid
+        enough to safely be loaded into the server's shared buffers.  If
+        given, <parameter>fork</parameter> specifies that only the pages of
+        the given fork are to be verified.  <parameter>fork</parameter> can
+        be <literal>main</literal> for the main data
+        fork, <literal>fsm</literal> for the free space
+        map, <literal>vm</literal> for the visibility map,
+        or <literal>init</literal> for the initialization fork.  The
+        default of <literal>NULL</literal> means that all forks of the
+        relation should be checked.  The function returns a list of block
+        numbers that appear corrupted along with the path names of their
+        files.  Use of this function is restricted to superusers by
+        default, but access may be granted to others
+        using <command>GRANT</command>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="functions-foreign-transaction">
+   <title>Foreign Transaction Management Functions</title>
+
+   <indexterm>
+    <primary>pg_resolve_foreign_xact</primary>
+   </indexterm>
+   <indexterm>
+    <primary>pg_remove_foreign_xact</primary>
+   </indexterm>
+
+   <para>
+    <xref linkend="functions-fdw-transaction-control-table"/> shows the functions
+    available for foreign transaction management.
+    These functions cannot be executed during recovery. Use of these function
+    is restricted to superusers.
+   </para>
+
+   <table id="functions-fdw-transaction-control-table">
+    <title>Foreign Transaction Management Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Resolve a foreign transaction. This function searches for foreign
+        transaction matching the arguments and resolves it. Once the foreign
+        transaction is resolved successfully, this function removes the
+        corresponding entry from <xref linkend="view-pg-foreign-xacts"/>.
+        This function won't resolve a foreign transaction which is being
+        processed.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>
+        This function works the same as <function>pg_resolve_foreign_xact</function>
+        except that this removes the foreign transaction entry without resolution.
+        This function is useful to remove a foreign transaction entry whose foreign
+        server is no longer available.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/>
+   control the foreign transaction resolvers.
+   </para>
+
+   <table id="functions-fdwxact-resolver-control-table">
+    <title>Foreign Transaction Resolver Control Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal>
+       </entry>
+       <entry><type>bool</type></entry>
+       <entry>
+        Stop the foreign transaction resolver running on the given database.
+        This function is useful for stopping a resolver process on the database
+        that you want to drop.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>pg_stop_fdwxact_resolver</function> is useful to be used before
+    dropping the database to that the foreign transaction resolver is connecting.
+   </para>
+
+  </sect2>
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcbb10fb6f..94006d0b2a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1094,6 +1094,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>CheckpointerMain</literal></entry>
       <entry>Waiting in main loop of checkpointer process.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLauncherMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolverMain</literal></entry>
+      <entry>Waiting in main loop of foreign transaction resolution worker process.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalLauncherMain</literal></entry>
+      <entry>Waiting in main loop of logical launcher process.</entry>
+     </row>
      <row>
       <entry><literal>LogicalApplyMain</literal></entry>
       <entry>Waiting in main loop of logical replication apply process.</entry>
@@ -1318,6 +1330,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>DataFileWrite</literal></entry>
       <entry>Waiting for a write to a relation data file.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactFileRead</literal></entry>
+      <entry>Waiting for a read of a foreign transaction state file.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileSync</literal></entry>
+      <entry>Waiting for a foreign transaction state file to reach stable storage.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactFileWrite</literal></entry>
+      <entry>Waiting for a write of a foreign transaction state file.</entry>
+     </row>
      <row>
       <entry><literal>LockFileAddToDataDirRead</literal></entry>
       <entry>Waiting for a read while adding a line to the data directory lock
@@ -1624,6 +1648,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for activity from a child process while
        executing a <literal>Gather</literal> plan node.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactResolution</literal></entry>
+      <entry>Waiting for all foreign transaction participants to be resolved during
+       atomic commit among foreign servers.</entry>
+     </row>
      <row>
       <entry><literal>HashBatchAllocate</literal></entry>
       <entry>Waiting for an elected Parallel Hash participant to allocate a hash
@@ -1942,6 +1971,19 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to read or update dynamic shared memory allocation
        information.</entry>
      </row>
+     <row>
+      <entry><literal>FdwXactLock</literal></entry>
+      <entry>Waiting to read or update the state of foreign transactions.</entry>
+     </row>
+     <row>
+      <entry><literal>FdwXactResolutionLock</literal></entry>
+      <entry>Waiting to read or update information of foreign transaction
+       resolution.</entry>
+     </row>
+     <row>
+      <entry><literal>LogicalRepWorkerLock</literal></entry>
+      <entry>Waiting for action on logical replication worker to finish.</entry>
+     </row>
      <row>
       <entry><literal>LockFastPath</literal></entry>
       <entry>Waiting to read or update a process' fast-path lock
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index d453be3909..eca35c4a84 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -171,6 +171,7 @@ break is not needed in a wider output rendering.
   &wal;
   &logical-replication;
   &jit;
+  &distributed-transaction;
   &regress;
 
  </part>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 7136bbe7a3..d1fc367dfc 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -83,6 +83,12 @@ Item
   subsystem</entry>
 </row>
 
+<row>
+ <entry><filename>pg_fdwxact</filename></entry>
+ <entry>Subdirectory containing files used by the distributed transaction
+  manager subsystem</entry>
+</row>
+
 <row>
  <entry><filename>pg_logical</filename></entry>
  <entry>Subdirectory containing status data for logical decoding</entry>
diff --git a/src/backend/access/transam/README.fdwxact b/src/backend/access/transam/README.fdwxact
new file mode 100644
index 0000000000..ee3a2b3614
--- /dev/null
+++ b/src/backend/access/transam/README.fdwxact
@@ -0,0 +1,142 @@
+src/backend/access/transam/README.fdwxact
+
+Atomic Commit for Distributed Transactions
+===========================================
+
+The atomic commit feature enables us to commit and rollback either all of
+foreign servers or nothing. This ensures that the database data is always left
+in a conssitent state in term of federated database.
+
+
+Commit Sequence of Global Transactions
+--------------------------------
+
+We employee two-phase commit protocol to achieve commit among all foreign
+servers atomically. The sequence of distributed transaction commit consisnts
+of the following four steps:
+
+1. Foriegn Transaction Registration
+FDW implementation can register the transaction that opened on the foreign server
+(foreign transaction) to the group of the distributed transaction by calling
+FdwXactRegisterEntry() function.  The foreign transaction are managed until
+the end of the transaction by PostgreSQL's distributed transaction manager.
+
+2. Pre-Commit phase (1st phase of two-phase commit)
+The two-phase commit is required only if the transaction modified two or more
+servers including the local node.
+
+In this step, we write WAL record (XLOG_FDWXACT_INSERT) for all foreign
+transactions indicating that the foreign server is involved with the current
+transaction before doing PREPARE all foreign transactions.  Also, we wait for
+those WAL records to be replicated to the standby if synchronous replication is
+enabled.  After that we prepare all foreign transactions on the foreign server.
+Both writing the WAL records and waiting for synchronous replication must be
+done before actual preparing foreign transactions.  Because if we prepare all
+foerign transactions first then write the WAL records and synchronous replication,
+in cases where the server crashes between preparing the foreign transaction and
+writing the WAL record, we will end up losing foreign transaction after the
+crash recovery. Similarly, if the failover also happens before the replica
+receives the records, we will end up losing them on the new primary server.
+
+In cases where we could not prepare one foreign transaction for some reason,
+we error out and change to rollback.  We rollback foreign transaction that is
+not prepared yet and leave other already-prepared foreign transactions as a
+in-doubt transaction to the resolver process.
+
+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.
+
+Once we've committed the local transaction (to be xact, right after flushing
+the commit WAL record to the disk), the outcome of the distributed transaction
+is determined and must not be changed.  All foreign prepared transactions must
+be committed anyway.
+
+4. Post-Commit Phase (2nd phase of two-phase commit)
+The steps so far are done by the backend process committing the transaction but
+this step (commit or rollback) is performed asynchronously by the foreign
+transaction resolver process.
+
+
+Identifying Foreign Transactions In GTM
+---------------------------------------
+
+To identify foreign transaction participants (as well as FdwXact entries) there
+are two ways: using {server OID, user OID} and using user mapping OID. The same
+is true for FDWs to identify the connections (and transactions upon) to the
+foreign server. We need to consider the case where the way to identify the
+transactions is not matched between GTM and FDWs, because the problem might occur
+when the user modifies the same foreign server by different roles within the
+transaction. For example, consider the following execution:
+
+BEGIN;
+SET ROLE user_A;
+INSERT INTO ft1 VALUES (1);
+SET ROLE user_B;
+INSERT INTO ft1 VALUES (1);
+COMMIT;
+
+For example, suppose that an FDW identifies the connection by {server OID, user OID}
+and GTM identifies the transactions by user mapping OID, and user_A and user_B use
+the public user mapping to connect server_X. In the FDW, there are two
+connections: {user_A, sever_X} and {user_B, server_X}, and therefore opens two
+transactions on each connection, while GTM has only one FdwXact entry because the two
+connections refer to the same user mapping OID. As a result, at the end of the
+transaction, GTM ends only one foreign transaction, leaving another one.
+
+On the other hand, suppose that an FDW identifies the connection by user mapping OID
+and GTM does that by {server OID, user OID}, the FDW uses only one connection and opens
+a transaction since both users refer to the same user mapping OID (we expect FDWs
+not to register the foreign transaction when not starting a new transaction on the
+foreign server). Since GTM also has one entry it can end the foreign transaciton
+properly. The downside would be that the user OID of FdwXact (i.g., FdwXact->userid)
+is the user who registered the foreign transaction for the first time, necessarily
+not the user who executed COMMIT.  For example in the above case, FdwXact->userid
+will be user_A, not user_B. But it’s not big problem in practice.
+
+Therefore, in fdwxact.c, we identify the foreign transaction by
+{server OID, user OID}.
+
+Foreign Transactions Status
+----------------------------
+
+Every foreign transaction has an FdwXact entry. When preparing a foreign
+transaction a FdwXact entry of which status starts from FDWXACT_STATUS_PREPARING
+are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED
+after the foreign transaction is prepared. And the status changes to
+FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING before committing and
+aborting respectively. FdwXact entry is removed with WAL logging after resolved.
+
+FdwXact entries recovered during the recovery are marked as in-doubt if the
+corresponding local transaction is not prepared transaction. The initial
+status for those entries is FDWXACT_STATUS_PREPARED if they are recovered
+from WAL. Because we WAL logs only when preparing the foreign transaction we
+cannot know the exact fate of the foreign transaction from the recovery.
+
+The foreign transaction status transition is illustrated by the following
+graph describing the FdwXact->status:
+
+ +----------------------------------------------------+
+ |                      INVALID                       |
+ +----------------------------------------------------+
+    |                      |                       |
+    |                      v                       |
+    |           +---------------------+            |
+   (*1)         |      PREPARING      |           (*1)
+    |           +---------------------+            |
+    |                      |                       |
+    v                      v                       v
+ +----------------------------------------------------+
+ |                      PREPARED                      |
+ +----------------------------------------------------+
+           |                               |
+           v                               v
+ +--------------------+          +--------------------+
+ |     COMMITTING     |          |      ABORTING      |
+ +--------------------+          +--------------------+
+           |                               |
+           v                               v
+ +----------------------------------------------------+
+ |                        END                         |
+ +----------------------------------------------------+
+
+(*1) Paths for recovered FdwXact entries
-- 
2.24.3 (Apple Git-128)

v37-0006-postgres_fdw-marks-foreign-transaction-as-modifi.patchapplication/octet-stream; name=v37-0006-postgres_fdw-marks-foreign-transaction-as-modifi.patchDownload
From 1733bad78bdf8c3cf190e262ee5d833779189c5b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Sat, 1 May 2021 09:00:01 +0900
Subject: [PATCH v37 6/9] postgres_fdw marks foreign transaction as modified on
 modification.

This commit enables postgres_fdw to execute two-phase commit protocol
on transaction commit (without explicitly executing PREPARE TRANSACTION).

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c   | 19 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c |  2 ++
 contrib/postgres_fdw/postgres_fdw.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index f8db97c641..262bf71485 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -63,6 +63,7 @@ typedef struct ConnCacheEntry
 	bool		keep_connections;	/* setting value of keep_connections
 									 * server option */
 	Oid			serverid;		/* foreign server OID used to get server name */
+	bool		modified;		/* true if data on the foreign server is modified */
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 	PgFdwConnState state;		/* extra per-connection state */
@@ -311,6 +312,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
 	entry->serverid = server->serverid;
+	entry->modified = false;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 							  ObjectIdGetDatum(server->serverid));
@@ -346,6 +348,20 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+void
+MarkConnectionModified(UserMapping *user)
+{
+	ConnCacheEntry *entry;
+
+	entry = GetConnectionCacheEntry(user->umid);
+
+	if (entry && !entry->modified)
+	{
+		FdwXactRegisterEntry(user, true);
+		entry->modified = true;
+	}
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -617,7 +633,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 			 entry->conn);
 
 		/* Register the foreign server to the transaction */
-		FdwXactRegisterEntry(user);
+		FdwXactRegisterEntry(user, false);
 
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
@@ -626,6 +642,7 @@ begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 		entry->changing_xact_state = true;
 		do_sql_command(entry->conn, sql);
 		entry->xact_depth = 1;
+		entry->modified = false;
 		entry->changing_xact_state = false;
 	}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e1c6bd9330..2e2aee47b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2654,6 +2654,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * establish new connection if necessary.
 	 */
 	dmstate->conn = GetConnection(user, false, &dmstate->conn_state);
+	MarkConnectionModified(user);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -3968,6 +3969,7 @@ create_foreign_modify(EState *estate,
 
 	/* Open connection; report that we'll create a prepared statement. */
 	fmstate->conn = GetConnection(user, true, &fmstate->conn_state);
+	MarkConnectionModified(user);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 97e4f244db..4fedbb76c4 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -144,6 +144,7 @@ extern void process_pending_request(AsyncRequest *areq);
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
 							 PgFdwConnState **state);
 extern void ReleaseConnection(PGconn *conn);
+extern void MarkConnectionModified(UserMapping *user);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern void do_sql_command(PGconn *conn, const char *sql);
-- 
2.24.3 (Apple Git-128)

v37-0007-Add-GetPrepareId-API.patchapplication/octet-stream; name=v37-0007-Add-GetPrepareId-API.patchDownload
From 1b765d951aa9619dac46227f7f0a8dc0232970dc Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 4 Nov 2020 14:41:53 +0900
Subject: [PATCH v37 7/9] Add GetPrepareId API

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c | 52 ++++++++++++++++++++++++----
 src/include/foreign/fdwapi.h         |  3 ++
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 9280d79a3a..c76714acbe 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -182,6 +182,7 @@ typedef struct FdwXactEntry
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
 	PrepareForeignTransaction_function prepare_foreign_xact_fn;
+	GetPrepareId_function get_prepareid_fn;
 } FdwXactEntry;
 
 /*
@@ -391,6 +392,7 @@ FdwXactRegisterEntry(UserMapping *usermapping, bool modified)
 	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdwent->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
+	fdwent->get_prepareid_fn = routine->GetPrepareId;
 
 	MemoryContextSwitchTo(old_ctx);
 
@@ -938,9 +940,10 @@ PrepareAllFdwXacts(TransactionId xid, bool prepare_all)
 }
 
 /*
- * Return a null-terminated foreign transaction identifier.  We generate an
- * unique identifier with in the form of
- * "fx_<random number>_<xid>_<umid> whose length is less than FDWXACT_ID_MAX_LEN.
+ * Return a null-terminated foreign transaction identifier.  If the given FDW
+ * supports getPrepareId callback we return the identifier returned from it.
+ * Otherwise we generate an unique identifier with in the form of
+ * "fx_<random number>_<xid>_<umid>" whose length is less than FDWXACT_ID_MAX_LEN.
  *
  * Returned string value is used to identify foreign transaction. The
  * identifier should not be same as any other concurrent prepared transaction
@@ -954,12 +957,47 @@ PrepareAllFdwXacts(TransactionId xid, bool prepare_all)
 static char *
 getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid)
 {
-	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+	char *id;
+	int	id_len;
 
-	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
-			 xid, fdwent->umid);
+	/*
+	 * If FDW doesn't provide the callback function, generate an unique
+	 * identifier.
+	 */
+	if (!fdwent->get_prepareid_fn)
+	{
+		char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+		snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
+				 xid, fdwent->umid);
+
+		return pstrdup(buf);
+	}
+
+	/* Get an unique identifier from callback function */
+	id = fdwent->get_prepareid_fn(xid, fdwent->server->serverid,
+								  fdwent->usermapping->userid,
+								  &id_len);
+
+	if (id == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 (errmsg("foreign transaction identifier is not provided"))));
+
+	/* Check length of foreign transaction identifier */
+	if (id_len > FDWXACT_ID_MAX_LEN)
+	{
+		id[FDWXACT_ID_MAX_LEN] = '\0';
+		ereport(ERROR,
+				(errcode(ERRCODE_NAME_TOO_LONG),
+				 errmsg("foreign transaction identifier \"%s\" is too long",
+						id),
+				 errdetail("Foreign transaction identifier must be less than %d characters.",
+						   FDWXACT_ID_MAX_LEN)));
+	}
 
-	return pstrdup(buf);
+	id[id_len] = '\0';
+	return pstrdup(id);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 05c758f869..33fd7899d7 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -195,6 +195,8 @@ typedef void (*ForeignAsyncNotify_function) (AsyncRequest *areq);
 typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
+										Oid userid, int *prep_id_len);
 
 
 /*
@@ -289,6 +291,7 @@ typedef struct FdwRoutine
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
 	PrepareForeignTransaction_function PrepareForeignTransaction;
+	GetPrepareId_function GetPrepareId;
 } FdwRoutine;
 
 
-- 
2.24.3 (Apple Git-128)

v37-0009-Add-regression-tests-for-foreign-twophase-commit.patchapplication/octet-stream; name=v37-0009-Add-regression-tests-for-foreign-twophase-commit.patchDownload
From d2ffe301eab90fb5394e79668a00dcc6fbc1b43d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Date: Thu, 26 Mar 2020 21:41:29 +0500
Subject: [PATCH v37 9/9] Add regression tests for foreign twophase commit.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_fdwxact/.gitignore      |   4 +
 src/test/modules/test_fdwxact/Makefile        |  28 +
 .../test_fdwxact/expected/test_fdwxact.out    | 200 +++++++
 src/test/modules/test_fdwxact/fdwxact.conf    |   7 +
 .../modules/test_fdwxact/sql/test_fdwxact.sql | 185 ++++++
 src/test/modules/test_fdwxact/t/001_basic.pl  | 110 ++++
 .../test_fdwxact/test_fdwxact--1.0.sql        |  44 ++
 src/test/modules/test_fdwxact/test_fdwxact.c  | 526 ++++++++++++++++++
 .../modules/test_fdwxact/test_fdwxact.control |   4 +
 src/test/recovery/Makefile                    |   2 +-
 src/test/recovery/t/025_fdwxact.pl            | 175 ++++++
 src/test/regress/pg_regress.c                 |  13 +-
 src/tools/msvc/Mkvcbuild.pm                   |   3 +-
 14 files changed, 1296 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/test_fdwxact/.gitignore
 create mode 100644 src/test/modules/test_fdwxact/Makefile
 create mode 100644 src/test/modules/test_fdwxact/expected/test_fdwxact.out
 create mode 100644 src/test/modules/test_fdwxact/fdwxact.conf
 create mode 100644 src/test/modules/test_fdwxact/sql/test_fdwxact.sql
 create mode 100644 src/test/modules/test_fdwxact/t/001_basic.pl
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.c
 create mode 100644 src/test/modules/test_fdwxact/test_fdwxact.control
 create mode 100644 src/test/recovery/t/025_fdwxact.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index dffc79b2d9..6fde3e8a84 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
 		  test_bloomfilter \
 		  test_ddl_deparse \
 		  test_extensions \
+		  test_fdwxact \
 		  test_ginpostinglist \
 		  test_integerset \
 		  test_misc \
diff --git a/src/test/modules/test_fdwxact/.gitignore b/src/test/modules/test_fdwxact/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_fdwxact/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_fdwxact/Makefile b/src/test/modules/test_fdwxact/Makefile
new file mode 100644
index 0000000000..b3fc99aee3
--- /dev/null
+++ b/src/test/modules/test_fdwxact/Makefile
@@ -0,0 +1,28 @@
+# src/test/modules/test_fdwxact/Makefile
+
+MODULE_big = test_fdwxact
+OBJS = \
+	$(WIN32RES) \
+	test_fdwxact.o
+PGFILEDESC = "test_fdwxact - test code for src/backend/access/fdwxact"
+
+EXTENSION = test_fdwxact
+DATA = test_fdwxact--1.0.sql
+
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/test_fdwxact/fdwxact.conf
+REGRESS = test_fdwxact
+
+NO_INSTALLCHECK = 1
+
+TAP_TESTS =1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_fdwxact
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_fdwxact/expected/test_fdwxact.out b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
new file mode 100644
index 0000000000..f622543b3e
--- /dev/null
+++ b/src/test/modules/test_fdwxact/expected/test_fdwxact.out
@@ -0,0 +1,200 @@
+--
+-- Test for foreign transaction management.
+--
+CREATE EXTENSION test_fdwxact;
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution() AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = 0 INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_2pc_2;
+ i 
+---
+(0 rows)
+
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+COMMIT;
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+ i 
+---
+(0 rows)
+
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+ERROR:  cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol
+DETAIL:  foreign_twophase_commit is 'required' but the transaction has some foreign servers which are not capable of two-phase commit
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution();
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution();
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
+ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol
diff --git a/src/test/modules/test_fdwxact/fdwxact.conf b/src/test/modules/test_fdwxact/fdwxact.conf
new file mode 100644
index 0000000000..0dc4b5fda1
--- /dev/null
+++ b/src/test/modules/test_fdwxact/fdwxact.conf
@@ -0,0 +1,7 @@
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 4
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = disabled
diff --git a/src/test/modules/test_fdwxact/sql/test_fdwxact.sql b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
new file mode 100644
index 0000000000..59802080b7
--- /dev/null
+++ b/src/test/modules/test_fdwxact/sql/test_fdwxact.sql
@@ -0,0 +1,185 @@
+--
+-- Test for foreign transaction management.
+--
+
+CREATE EXTENSION test_fdwxact;
+
+-- setup one server that don't support transaction management API
+CREATE SERVER srv_1 FOREIGN DATA WRAPPER test_fdw;
+
+-- setup two servers that support only commit and rollback API
+CREATE SERVER srv_no2pc_1 FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_no2pc_2 FOREIGN DATA WRAPPER test_no2pc_fdw;
+
+-- setup two servers that support commit, rollback and prepare API.
+-- That is, those two server support two-phase commit protocol.
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft_1 (i int) SERVER srv_1;
+CREATE FOREIGN TABLE ft_no2pc_1 (i int) SERVER srv_no2pc_1;
+CREATE FOREIGN TABLE ft_no2pc_2 (i int) SERVER srv_no2pc_2;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc_2;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+
+-- function to wait for counters to advance
+CREATE PROCEDURE wait_for_resolution() AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  resolved bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT count(*) = 0 INTO resolved FROM pg_foreign_xacts;
+
+    exit WHEN resolved;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_resolution delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+
+-- Test 'disabled' mode.
+-- Modifies one or two servers but since we don't require two-phase
+-- commit, all case should not raise an error.
+SET foreign_twophase_commit TO disabled;
+
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+ROLLBACK;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+
+
+-- Test 'required' mode.
+-- In this case, when two-phase commit is required, all servers
+-- which are involved in the and modified need to support two-phase
+-- commit protocol. Otherwise transaction will rollback.
+SET foreign_twophase_commit TO 'required';
+
+-- Ok. Writing only one server doesn't require two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Writing two servers, we require two-phase commit and success.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES (1);
+INSERT INTO ft_2pc_2 VALUES (1);
+COMMIT;
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. Only reading servers doesn't require two-phase commit.
+BEGIN;
+SELECT * FROM ft_2pc_1;
+SELECT * FROM ft_2pc_2;
+COMMIT;
+BEGIN;
+SELECT * FROM ft_1;
+SELECT * FROM ft_no2pc_1;
+COMMIT;
+
+-- Ok. Read one server and write one server.
+BEGIN;
+SELECT * FROM ft_1;
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+BEGIN;
+SELECT * FROM ft_no2pc_1;
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Ok. only ft_2pc_1 is committed in one-phase.
+BEGIN;
+INSERT INTO ft_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. ft_no2pc_1 doesn't support two-phase commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_2pc_1 VALUES (1);
+COMMIT;
+
+-- Error. Both ft_no2pc_1 and ft_no2pc_2 don't support two-phase
+-- commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+INSERT INTO ft_no2pc_2 VALUES (1);
+COMMIT;
+
+-- Error. Two-phase commit is required because of writes on two
+-- servers: local node and ft_no2pc_1. But ft_no2pc_1 doesn't support
+-- two-phase commit.
+BEGIN;
+INSERT INTO t VALUES (1);
+INSERT INTO ft_no2pc_1 VALUES (1);
+COMMIT;
+
+
+-- Tests for PREPARE.
+-- Prepare two transactions: local and foreign.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+INSERT INTO t VALUES(3);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+COMMIT PREPARED 'global_x1';
+CALL wait_for_resolution();
+
+-- Even if the transaction modified only one foreign server,
+-- we prepare foreign transaction.
+BEGIN;
+INSERT INTO ft_2pc_1 VALUES(1);
+PREPARE TRANSACTION 'global_x1';
+SELECT count(*) FROM pg_foreign_xacts;
+ROLLBACK PREPARED 'global_x1';
+CALL wait_for_resolution();
+
+-- Error. PREPARE needs all involved foreign servers to
+-- support two-phsae commit.
+BEGIN;
+INSERT INTO ft_no2pc_1 VALUES (1);
+PREPARE TRANSACTION 'global_x1';
diff --git a/src/test/modules/test_fdwxact/t/001_basic.pl b/src/test/modules/test_fdwxact/t/001_basic.pl
new file mode 100644
index 0000000000..760c54db87
--- /dev/null
+++ b/src/test/modules/test_fdwxact/t/001_basic.pl
@@ -0,0 +1,110 @@
+use File::Copy qw/copy move/;
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $node = get_new_node('main');
+$node->init;
+$node->append_conf('postgresql.conf', qq(
+shared_preload_libraries = 'test_fdwxact'
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 4
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = required
+test_fdwxact.log_api_calls = true
+				   ));
+$node->start;
+
+$node->psql(
+	'postgres', "
+CREATE EXTENSION test_fdwxact;
+CREATE SERVER srv FOREIGN DATA WRAPPER test_fdw;
+CREATE SERVER srv_no2pc FOREIGN DATA WRAPPER test_no2pc_fdw;
+CREATE SERVER srv_2pc_1 FOREIGN DATA WRAPPER test_2pc_fdw;
+CREATE SERVER srv_2pc_2 FOREIGN DATA WRAPPER test_2pc_fdw;
+
+CREATE TABLE t (i int);
+CREATE FOREIGN TABLE ft (i int) SERVER srv;
+CREATE FOREIGN TABLE ft_no2pc (i int) SERVER srv_no2pc;
+CREATE FOREIGN TABLE ft_2pc_1 (i int) SERVER srv_2pc_1;
+CREATE FOREIGN TABLE ft_2pc_2 (i int) SERVER srv_2pc_2;
+
+CREATE USER MAPPING FOR PUBLIC SERVER srv;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_no2pc;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_1;
+CREATE USER MAPPING FOR PUBLIC SERVER srv_2pc_2;
+	");
+
+sub run_transaction
+{
+	my ($node, $prepsql, $sql, $endsql, $expected) = @_;
+
+	$endsql = 'COMMIT' unless defined $endsql;
+	$expected = 0 unless defined $expected;
+
+	local $ENV{PGHOST} = $node->host;
+	local $ENV{PGPORT} = $node->port;
+
+	truncate $node->logfile, 0;
+
+	$node->safe_psql('postgres', $prepsql);
+	my ($cmdret, $stdout, $stderr) = $node->psql('postgres',
+												 "BEGIN;
+												 SELECT txid_current() as xid;
+												 $sql
+												 $endsql;
+												 ");
+	$node->poll_query_until('postgres',
+							"SELECT count(*) = $expected FROM pg_foreign_xacts");
+
+	my $log = TestLib::slurp_file($node->logfile);
+
+	return $log, $stdout;
+}
+
+my ($log, $xid);
+
+# The transaction is committed using two-phase commit.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-1");
+like($log, qr/commit prepared tx_$xid on srv_2pc_2/, "commit prepared transaction-2");
+
+# Similary, two-phase commit is used.
+($log, $xid) = run_transaction($node, "",
+					  "INSERT INTO t VALUES(1);
+					  INSERT INTO ft_2pc_1 VALUES(1);");
+like($log, qr/commit prepared tx_$xid on srv_2pc_1/, "commit prepared transaction-3");
+
+# Test the failure case of PREPARE TRANSACTION. We prepare the distributed
+# transaction with the same identifer.  The second attempt will fail when preparing
+# the local transaction, which is performed after preparing the foreign transaction
+# on srv_2pc_1. Therefore the transaction should rollback the prepared foreign
+# transaction.
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+($log, $xid) = run_transaction($node, "",
+							   "INSERT INTO t VALUES(1);
+							   INSERT INTO ft_2pc_1 VALUES(1);",
+							   "PREPARE TRANSACTION 'tx1'", 1);
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "failure after prepare transaction");
+$node->safe_psql('postgres', "COMMIT PREPARED 'tx1'");
+
+# Inject an error into prepare phase on srv_2pc_1. The transaction fails during
+# preparing the foreign transaction on srv_2pc_1. Then, we try to both 'rollback' and
+# 'rollback prepared' the foreign transaction, and rollback another foreign
+# transaction.
+($log, $xid) = run_transaction($node,
+							   "SELECT test_inject_error('error', 'prepare', 'srv_2pc_1');",
+							   "INSERT INTO ft_2pc_1 VALUES(1);
+							   INSERT INTO ft_2pc_2 VALUES(1);");
+like($log, qr/rollback $xid on srv_2pc_1/, "rollback on failed server");
+like($log, qr/rollback prepared tx_$xid on srv_2pc_1/, "rollback prepared on failed server");
+like($log, qr/rollback .* on srv_2pc_2/, "rollback on another server");
diff --git a/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
new file mode 100644
index 0000000000..f676dfe04b
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/test_atomic_commit/test_atomic_commit--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_atomic_commit" to load this file. \quit
+
+-- test_fdw doesn't use transaction API
+CREATE FUNCTION test_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_fdw
+  HANDLER test_fdw_handler;
+
+-- test_no2pc_fdw uses only COMMIT and ROLLBACK API
+CREATE FUNCTION test_no2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_no2pc_fdw
+  HANDLER test_no2pc_fdw_handler;
+
+-- test_2pc uses PREPARE API as well
+CREATE FUNCTION test_2pc_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER test_2pc_fdw
+  HANDLER test_2pc_fdw_handler;
+
+CREATE FUNCTION test_inject_error(
+elevel text,
+phase text,
+server text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_reset_error()
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.c b/src/test/modules/test_fdwxact/test_fdwxact.c
new file mode 100644
index 0000000000..08f15de849
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.c
@@ -0,0 +1,526 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_fdwxact.c
+ *		  Test modules for foreign transaction management
+ *
+ * This module implements three types of foreign data wrapper: the first
+ * doesn't support any transaction FDW APIs, the second supports only
+ * commit and rollback API and the third supports all transaction API including
+ * prepare.
+ *
+ * Also, this module has an ability to inject an error at prepare callback or
+ * commit callback using test_inject_error() SQL function. The information of
+ * injected error is stored in the shared memory so that backend processes and
+ * resolver processes can see it.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_fdwxact/test_fdwxact.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "commands/defrem.h"
+#include "access/reloptions.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/ipc.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+#define TEST_FDWXCT_MAX_NAME_LEN 32
+
+typedef struct testFdwXactSharedState
+{
+	char	elevel[TEST_FDWXCT_MAX_NAME_LEN];
+	char	phase[TEST_FDWXCT_MAX_NAME_LEN];
+	char	server[TEST_FDWXCT_MAX_NAME_LEN];
+	LWLock	*lock;
+} testFdwXactSharedState;
+testFdwXactSharedState *fxss = NULL;
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static bool log_api_calls = false;
+
+void _PG_init(void);
+void _PG_fini(void);
+PG_FUNCTION_INFO_V1(test_fdw_handler);
+PG_FUNCTION_INFO_V1(test_no2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_2pc_fdw_handler);
+PG_FUNCTION_INFO_V1(test_inject_error);
+PG_FUNCTION_INFO_V1(test_reset_error);
+
+static void test_fdwxact_shmem_startup(void);
+static bool check_event(char *servername, char *phase, int *elevel);
+static void testGetForeignRelSize(PlannerInfo *root,
+								  RelOptInfo *baserel,
+								  Oid foreigntableid);
+static void testGetForeignPaths(PlannerInfo *root,
+								RelOptInfo *baserel,
+								Oid foreigntableid);
+static ForeignScan *testGetForeignPlan(PlannerInfo *root,
+									   RelOptInfo *foreignrel,
+									   Oid foreigntableid,
+									   ForeignPath *best_path,
+									   List *tlist,
+									   List *scan_clauses,
+									   Plan *outer_plan);
+static void testBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *testIterateForeignScan(ForeignScanState *node);
+static void testReScanForeignScan(ForeignScanState *node);
+static void testEndForeignScan(ForeignScanState *node);
+static void testBeginForeignModify(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo,
+								   List *fdw_private,
+								   int subplan_index,
+								   int eflags);
+static void testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo,
+												   List *fdw_private,
+												   int subplan_index,
+												   int eflags);
+static TupleTableSlot *testExecForeignInsert(EState *estate,
+											 ResultRelInfo *resultRelInfo,
+											 TupleTableSlot *slot,
+											 TupleTableSlot *planSlot);
+static void testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+												   ResultRelInfo *resultRelInfo);
+static void testEndForeignModify(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static void testBeginForeignInsert(ModifyTableState *mtstate,
+								   ResultRelInfo *resultRelInfo);
+static void testEndForeignInsert(EState *estate,
+								 ResultRelInfo *resultRelInfo);
+static int	testIsForeignRelUpdatable(Relation rel);
+static void testPrepareForeignTransaction(FdwXactInfo *finfo);
+static void testCommitForeignTransaction(FdwXactInfo *finfo);
+static void testRollbackForeignTransaction(FdwXactInfo *finfo);
+static char *testGetPrepareId(TransactionId xid, Oid serverid,
+							  Oid userid, int *prep_id_len);
+
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("test_fdwxact.log_api_calls",
+							 "Report transaction API calls to logs.",
+							 NULL,
+							 &log_api_calls,
+							 false,
+							 PGC_USERSET,
+							 0,
+							 NULL, NULL, NULL);
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(testFdwXactSharedState)));
+	RequestNamedLWLockTranche("test_fdwxact", 1);
+
+	prev_shmem_startup_hook = shmem_startup_hook;
+	shmem_startup_hook = test_fdwxact_shmem_startup;
+}
+
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	shmem_startup_hook = prev_shmem_startup_hook;
+}
+
+static void
+test_fdwxact_shmem_startup(void)
+{
+	bool found;
+
+	if (prev_shmem_startup_hook)
+		prev_shmem_startup_hook();
+
+	fxss = ShmemInitStruct("test_fdwxact",
+						   sizeof(testFdwXactSharedState),
+						   &found);
+	if (!found)
+	{
+		memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+		fxss->lock = &(GetNamedLWLockTranche("test_fdwxact"))->lock;
+	}
+}
+
+Datum
+test_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModify;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsert;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_no2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support only COMMIT and ROLLBACK */
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+Datum
+test_2pc_fdw_handler(PG_FUNCTION_ARGS)
+{
+	FdwRoutine *routine = makeNode(FdwRoutine);
+
+	/* Functions for scanning foreign tables */
+	routine->GetForeignRelSize = testGetForeignRelSize;
+	routine->GetForeignPaths = testGetForeignPaths;
+	routine->GetForeignPlan = testGetForeignPlan;
+	routine->BeginForeignScan = testBeginForeignScan;
+	routine->IterateForeignScan = testIterateForeignScan;
+	routine->ReScanForeignScan = testReScanForeignScan;
+	routine->EndForeignScan = testEndForeignScan;
+
+	/* Functions for updating foreign tables */
+	routine->AddForeignUpdateTargets = NULL;
+	routine->PlanForeignModify = NULL;
+	routine->BeginForeignModify = testBeginForeignModifyWithRegistration;
+	routine->ExecForeignInsert = testExecForeignInsert;
+	routine->EndForeignModify = testEndForeignModify;
+	routine->BeginForeignInsert = testBeginForeignInsertWithRegistration;
+	routine->EndForeignInsert = testEndForeignInsert;
+	routine->IsForeignRelUpdatable = testIsForeignRelUpdatable;
+
+	/* Support all functions for foreign transactions */
+	routine->GetPrepareId = testGetPrepareId;
+	routine->PrepareForeignTransaction = testPrepareForeignTransaction;
+	routine->CommitForeignTransaction = testCommitForeignTransaction;
+	routine->RollbackForeignTransaction = testRollbackForeignTransaction;
+
+	PG_RETURN_POINTER(routine);
+}
+
+static void
+testGetForeignRelSize(PlannerInfo *root,
+					  RelOptInfo *baserel,
+					  Oid foreigntableid)
+{
+	baserel->pages = 10;
+	baserel->tuples = 100;
+}
+
+static void
+testGetForeignPaths(PlannerInfo *root,
+					RelOptInfo *baserel,
+					Oid foreigntableid)
+{
+	add_path(baserel, (Path *) create_foreignscan_path(root, baserel,
+													   NULL,
+													   10, 10, 10,
+													   NIL,
+													   baserel->lateral_relids,
+													   NULL, NIL));
+}
+
+static ForeignScan *
+testGetForeignPlan(PlannerInfo *root,
+				   RelOptInfo *foreignrel,
+				   Oid foreigntableid,
+				   ForeignPath *best_path,
+				   List *tlist,
+				   List *scan_clauses,
+				   Plan *outer_plan)
+{
+	return make_foreignscan(tlist,
+							NIL,
+							foreignrel->relid,
+							NIL,
+							NULL,
+							NIL,
+							NIL,
+							outer_plan);
+}
+
+static void
+testBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	return;
+}
+
+static TupleTableSlot *
+testIterateForeignScan(ForeignScanState *node)
+{
+	return ExecClearTuple(node->ss.ss_ScanTupleSlot);
+}
+
+static void
+testReScanForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+static void
+testEndForeignScan(ForeignScanState *node)
+{
+	return;
+}
+
+/* Register the foreign transaction */
+static void
+testRegisterFdwXact(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					bool modified)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	RangeTblEntry	*rte;
+	ForeignTable *table;
+	UserMapping	*usermapping;
+	Oid		userid;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
+						mtstate->ps.state);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+	table = GetForeignTable(RelationGetRelid(rel));
+	usermapping = GetUserMapping(userid, table->serverid);
+	FdwXactRegisterEntry(usermapping, modified);
+}
+
+
+static void
+testBeginForeignModify(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo,
+					   List *fdw_private,
+					   int subplan_index,
+					   int eflags)
+{
+	return;
+}
+
+static void
+testBeginForeignModifyWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo,
+									   List *fdw_private,
+									   int subplan_index,
+									   int eflags)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo,
+						(eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0);
+	return;
+}
+
+static TupleTableSlot *
+testExecForeignInsert(EState *estate,
+					  ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
+					  TupleTableSlot *planSlot)
+{
+	return slot;
+}
+
+static void
+testEndForeignModify(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsert(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static void
+testBeginForeignInsertWithRegistration(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo)
+{
+	testRegisterFdwXact(mtstate, resultRelInfo, true);
+	return;
+}
+
+static void
+testEndForeignInsert(EState *estate,
+					 ResultRelInfo *resultRelInfo)
+{
+	return;
+}
+
+static int
+testIsForeignRelUpdatable(Relation rel)
+{
+	/* allow only inserts */
+	return (1 << CMD_INSERT);
+}
+
+static char *
+testGetPrepareId(TransactionId xid, Oid serverid,
+				 Oid userid, int *prep_id_len)
+{
+	static char buf[32] = {0};
+
+	*prep_id_len = snprintf(buf, 32, "tx_%u", xid);
+
+	return buf;
+}
+
+static void
+testPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+
+	if (check_event(finfo->server->servername, "prepare", &elevel))
+		elog(elevel, "injected error at prepare");
+
+	if (log_api_calls)
+		ereport(LOG, (errmsg("prepare %s on %s",
+							 finfo->identifier,
+							 finfo->server->servername)));
+}
+
+static void
+testCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	int elevel;
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (check_event(finfo->server->servername, "commit", &elevel))
+		elog(elevel, "injected error at commit");
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("commit %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("commit prepared %s on %s",
+								 finfo->identifier,
+								 finfo->server->servername)));
+	}
+}
+
+static void
+testRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	TransactionId xid = GetTopTransactionIdIfAny();
+
+	if (log_api_calls)
+	{
+		if (finfo->flags && FDWXACT_FLAG_ONEPHASE)
+			ereport(LOG, (errmsg("rollback %u on %s",
+								 xid, finfo->server->servername)));
+		else
+			ereport(LOG, (errmsg("rollback prepared %s on %s",
+								 finfo->identifier,
+								 finfo->server->servername)));
+	}
+}
+
+/*
+ * Check if an event is set at the phase on the server. If there is, set
+ * elevel and return true.
+ */
+static bool
+check_event(char *servername, char *phase, int *elevel)
+{
+	LWLockAcquire(fxss->lock, LW_SHARED);
+
+	if (pg_strcasecmp(fxss->server, servername) != 0 ||
+		pg_strcasecmp(fxss->phase, phase) != 0)
+	{
+		LWLockRelease(fxss->lock);
+		return false;
+	}
+
+	/* Currently support only error and panic */
+	if (pg_strcasecmp(fxss->elevel, "error") == 0)
+		*elevel = ERROR;
+	if (pg_strcasecmp(fxss->elevel, "panic") == 0)
+		*elevel = PANIC;
+
+	LWLockRelease(fxss->lock);
+
+	return true;
+}
+
+/* SQL function to inject an error */
+Datum
+test_inject_error(PG_FUNCTION_ARGS)
+{
+	char *elevel = text_to_cstring(PG_GETARG_TEXT_P(0));
+	char *phase = text_to_cstring(PG_GETARG_TEXT_P(1));
+	char *server = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	strncpy(fxss->elevel, elevel, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->phase, phase, TEST_FDWXCT_MAX_NAME_LEN);
+	strncpy(fxss->server, server, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
+
+/* SQL function to reset an error */
+Datum
+test_reset_error(PG_FUNCTION_ARGS)
+{
+	LWLockAcquire(fxss->lock, LW_EXCLUSIVE);
+	memset(fxss->elevel, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->phase, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	memset(fxss->server, 0, TEST_FDWXCT_MAX_NAME_LEN);
+	LWLockRelease(fxss->lock);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/test_fdwxact/test_fdwxact.control b/src/test/modules/test_fdwxact/test_fdwxact.control
new file mode 100644
index 0000000000..ac9945ba03
--- /dev/null
+++ b/src/test/modules/test_fdwxact/test_fdwxact.control
@@ -0,0 +1,4 @@
+comment = 'Test code for fdwxact'
+default_version = '1.0'
+module_pathname = '$libdir/test_fdwxact'
+relocatable = true
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 96442ceb4e..0e5e05e41a 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-------------------------------------------------------------------------
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/025_fdwxact.pl b/src/test/recovery/t/025_fdwxact.pl
new file mode 100644
index 0000000000..9af9bb81dc
--- /dev/null
+++ b/src/test/recovery/t/025_fdwxact.pl
@@ -0,0 +1,175 @@
+# Tests for transaction involving foreign servers
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Setup master node
+my $node_master = get_new_node("master");
+my $node_standby = get_new_node("standby");
+
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', qq(
+max_prepared_transactions = 10
+max_prepared_foreign_transactions = 10
+max_foreign_transaction_resolvers = 2
+foreign_transaction_resolver_timeout = 0
+foreign_transaction_resolution_retry_interval = 5s
+foreign_twophase_commit = on
+));
+$node_master->start;
+
+# Take backup from master node
+my $backup_name = 'master_backup';
+$node_master->backup($backup_name);
+
+# Set up standby node
+$node_standby->init_from_backup($node_master, $backup_name,
+							   has_streaming => 1);
+$node_standby->start;
+
+# Set up foreign nodes
+my $node_fs1 = get_new_node("fs1");
+my $node_fs2 = get_new_node("fs2");
+my $fs1_port = $node_fs1->port;
+my $fs2_port = $node_fs2->port;
+$node_fs1->init;
+$node_fs2->init;
+$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_fs1->start;
+$node_fs2->start;
+
+# Create foreign servers on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE EXTENSION postgres_fdw
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs1_port');
+));
+$node_master->safe_psql('postgres', qq(
+CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw
+OPTIONS (dbname 'postgres', port '$fs2_port');
+));
+
+# Create user mapping on the master node
+$node_master->safe_psql('postgres', qq(
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs1;
+CREATE USER MAPPING FOR CURRENT_USER SERVER fs2;
+));
+
+# Create tables on foreign nodes and import them to the master node
+$node_fs1->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t1 (c int);
+));
+$node_fs2->safe_psql('postgres', qq(
+CREATE SCHEMA fs;
+CREATE TABLE fs.t2 (c int);
+));
+$node_master->safe_psql('postgres', qq(
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public;
+IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public;
+CREATE TABLE l_table (c int);
+));
+
+# Switch to synchronous replication
+$node_master->safe_psql('postgres', qq(
+ALTER SYSTEM SET synchronous_standby_names ='*';
+));
+$node_master->reload;
+
+my $result;
+
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node. Check if we can commit and rollback the foreign transactions
+# after the normal recovery.
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (1);
+INSERT INTO t2 VALUES (1);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (2);
+INSERT INTO t2 VALUES (2);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->stop;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after recovery');
+
+#
+# Prepare two transactions involving multiple foreign servers and shutdown
+# the master node immediately. Check if we can commit and rollback the foreign
+# transactions after the crash recovery.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (3);
+INSERT INTO t2 VALUES (3);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (4);
+INSERT INTO t2 VALUES (4);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+# Commit and rollback foreign transactions after the crash recovery.
+$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1'));
+is($result, 0, 'Commit foreign transactions after crash recovery');
+$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2'));
+is($result, 0, 'Rollback foreign transactions after crash recovery');
+
+#
+# Commit transaction involving foreign servers and shutdown the master node
+# immediately before checkpoint. Check that WAL replay cleans up
+# its shared memory state release locks while replaying transaction commit.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (5);
+INSERT INTO t2 VALUES (5);
+COMMIT;
+));
+
+$node_master->teardown_node;
+$node_master->start;
+
+$result = $node_master->safe_psql('postgres', qq(
+SELECT count(*) FROM pg_foreign_xacts;
+));
+is($result, 0, "Cleanup of shared memory state for foreign transactions");
+
+#
+# Check if the standby node can process prepared foreign transaction
+# after promotion.
+#
+$node_master->safe_psql('postgres', qq(
+BEGIN;
+INSERT INTO t1 VALUES (6);
+INSERT INTO t2 VALUES (6);
+PREPARE TRANSACTION 'gxid1';
+BEGIN;
+INSERT INTO t1 VALUES (7);
+INSERT INTO t2 VALUES (7);
+PREPARE TRANSACTION 'gxid2';
+));
+
+$node_master->teardown_node;
+$node_standby->promote;
+
+$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';));
+is($result, 0, 'Commit foreign transaction after promotion');
+$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';));
+is($result, 0, 'Rollback foreign transaction after promotion');
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 05296f7ee1..e7fad35196 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2391,9 +2391,12 @@ regression_main(int argc, char *argv[],
 		 * Adjust the default postgresql.conf for regression testing. The user
 		 * can specify a file to be appended; in any case we expand logging
 		 * and set max_prepared_transactions to enable testing of prepared
-		 * xacts.  (Note: to reduce the probability of unexpected shmmax
-		 * failures, don't set max_prepared_transactions any higher than
-		 * actually needed by the prepared_xacts regression test.)
+		 * xacts.  We also set max_prepared_foreign_transactions and
+		 * max_foreign_transaction_resolvers to enable testing of transaction
+		 * involving multiple foreign servers. (Note: to reduce the probability
+		 * of unexpected shmmax failures, don't set max_prepared_transactions
+		 * any higher than actually needed by the prepared_xacts regression
+		 * test.)
 		 */
 		snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance);
 		pg_conf = fopen(buf, "a");
@@ -2408,7 +2411,9 @@ regression_main(int argc, char *argv[],
 		fputs("log_line_prefix = '%m %b[%p] %q%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
-		fputs("max_prepared_transactions = 2\n", pg_conf);
+		fputs("max_prepared_transactions = 3\n", pg_conf);
+		fputs("max_prepared_foreign_transactions = 2\n", pg_conf);
+		fputs("max_foreign_transaction_resolvers = 2\n", pg_conf);
 
 		for (sl = temp_configs; sl != NULL; sl = sl->next)
 		{
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 233ddbf4c2..5ae7d0d795 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -54,7 +54,8 @@ my @contrib_excludes = (
 	'pgcrypto',         'sepgsql',
 	'brin',             'test_extensions',
 	'test_misc',        'test_pg_dump',
-	'snapshot_too_old', 'unsafe_tests');
+	'snapshot_too_old', 'unsafe_tests',
+	'test_fdwxact');
 
 # Set of variables for frontend modules
 my $frontend_defines = { 'initdb' => 'FRONTEND' };
-- 
2.24.3 (Apple Git-128)

v37-0005-Prepare-foreign-transactions-at-commit-time.patchapplication/octet-stream; name=v37-0005-Prepare-foreign-transactions-at-commit-time.patchDownload
From 63220a37295f98a022835e121465481fc947b3ee Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Nov 2020 21:02:29 +0900
Subject: [PATCH v37 5/9] Prepare foreign transactions at commit time

With this commit, the foreign server modified within the transaction
marked as 'modified'. On the 'modified' servers, foreign transactions
are prepared automatically if foreign_twophase_commit is
'required'. Previously, users need to do PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to use two-phase commit protocol. This commit
enables users to use two-phase commit protocol transparently. Prepared
foreign transactions are resolved in asynchronous manner by foreign
transaction resolver process.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/fdwxact.c          | 164 +++++++++++++++++-
 src/backend/utils/misc/guc.c                  |  28 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/access/fdwxact.h                  |  11 +-
 src/include/foreign/fdwapi.h                  |   2 +-
 5 files changed, 197 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index 0c6e80a6de..9280d79a3a 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -20,6 +20,23 @@
  *
  * FOREIGN TRANSACTION RESOLUTION
  *
+ * The transaction involving multiple foreign transactions uses two-phase commit
+ * protocol to commit the distributed transaction if enabled.  The basic strategy
+ * is that we prepare all of the remote transactions before committing locally and
+ * commit them after committing locally.
+ *
+ * At pre-commit of local transaction, we prepare the transactions on all foreign
+ * servers after logging the information of foreign transaction.  The result of
+ * distributed transaction is determined by the result of the corresponding local
+ * transaction.  Once the local transaction is successfully committed, all
+ * transactions on foreign servers must be committed.  In case where an error occurred
+ * before the local transaction commit all transactions must be aborted.  After
+ * committing or rolling back locally, we leave foreign transactions as in-doubt
+ * transactions and then notify the resolver process. The resolver process asynchronously
+ * resolves these foreign transactions according to the result of the corresponding local
+ * transaction.  Also, the user can use pg_resolve_foreign_xact() SQL function to
+ * resolve a foreign transaction manually.
+ *
  * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
  * PrepareForeignTransaction() API for each foreign transaction regardless of data on
  * the foreign server having been modified.  At COMMIT PREPARED and ROLLBACK PREPARED,
@@ -97,8 +114,10 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lock.h"
+#include "storage/pmsignal.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -119,6 +138,10 @@
 #define ServerSupportTwophaseCommit(fdwent) \
 	(((FdwXactEntry *)(fdwent))->prepare_foreign_xact_fn != NULL)
 
+/* Foreign twophase commit is enabled and requested by user */
+#define IsForeignTwophaseCommitRequested() \
+	 (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
+
 /*
  * Name of foreign prepared transaction file is 8 bytes xid and
  * user mapping OID separated by '_'.
@@ -152,6 +175,9 @@ typedef struct FdwXactEntry
 	 */
 	FdwXactState fdwxact;
 
+	/* true if modified the data on the server */
+	bool		modified;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
@@ -167,10 +193,13 @@ typedef struct DistributedXactStateData
 {
 	bool		local_prepared; /* will (did) we prepare the local transaction? */
 
+	bool	twophase_commit_required;
+
 	/* Statistics of participants */
 	int			nparticipants_no_twophase;	/* how many participants doesn't
 											 * support two-phase commit
 											 * protocol? */
+	int		nparticipants_modified;		/* how many participants are modified? */
 
 	HTAB	   *participants;	/* foreign transaction participants (FdwXactEntry) */
 	List	   *serveroids_uniq;	/* list of unique server OIDs in
@@ -178,7 +207,9 @@ typedef struct DistributedXactStateData
 } DistributedXactStateData;
 static DistributedXactStateData DistributedXactState = {
 	.local_prepared = false,
+	.twophase_commit_required = false,
 	.nparticipants_no_twophase = 0,
+	.nparticipants_modified = 0,
 	.participants = NULL,
 	.serveroids_uniq = NIL,
 };
@@ -191,18 +222,19 @@ static DistributedXactStateData DistributedXactState = {
 /* Keep track of registering process exit call back. */
 static bool fdwXactExitRegistered = false;
 
+
 /* Guc parameter */
 int			max_prepared_foreign_xacts = 0;
 int			max_foreign_xact_resolvers = 0;
+int			foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED;
 
 static void RemoveFdwXactEntry(Oid umid);
 static void EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit,
 							bool is_parallel_worker);
 static char *getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid);
 static void ForgetAllParticipants(void);
-static void FdwXactLaunchResolvers(void);
 
-static void PrepareAllFdwXacts(TransactionId xid);
+static void PrepareAllFdwXacts(TransactionId xid, bool prepare_all);
 static XLogRecPtr FdwXactInsertEntry(TransactionId xid, FdwXactEntry *fdwent,
 									 char *identifier);
 static void AtProcExit_FdwXact(int code, Datum arg);
@@ -216,6 +248,7 @@ static char *ProcessFdwXactBuffer(TransactionId xid, Oid umid,
 static char *ReadFdwXactStateFile(TransactionId xid, Oid umid);
 static void RemoveFdwXactStateFile(TransactionId xid, Oid umid, bool giveWarning);
 static void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, int len);
+static bool checkForeignTwophaseCommitRequired(bool local_modified);
 
 static FdwXactState insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid,
 								   Oid owner, char *identifier);
@@ -292,7 +325,7 @@ FdwXactShmemInit(void)
  * given user mapping OID as a participant of the transaction.
  */
 void
-FdwXactRegisterEntry(UserMapping *usermapping)
+FdwXactRegisterEntry(UserMapping *usermapping, bool modified)
 {
 	FdwXactEntry *fdwent;
 	FdwRoutine *routine;
@@ -318,8 +351,21 @@ FdwXactRegisterEntry(UserMapping *usermapping)
 	fdwent = hash_search(DistributedXactState.participants,
 						 (void *) &umid, HASH_ENTER, &found);
 
+	/* Already registered */
 	if (found)
+	{
+		/* Update statistics if necessary  */
+		if (fdwent->modified && !modified)
+			DistributedXactState.nparticipants_modified--;
+		else if (!fdwent->modified && modified)
+			DistributedXactState.nparticipants_modified++;
+
+		fdwent->modified = modified;
+
+		Assert(DistributedXactState.nparticipants_modified <=
+		   hash_get_num_entries(DistributedXactState.participants));
 		return;
+	}
 
 	/*
 	 * The participant information needs to live until the end of the
@@ -341,6 +387,7 @@ FdwXactRegisterEntry(UserMapping *usermapping)
 				(errmsg("cannot register foreign server not supporting transaction callback")));
 
 	fdwent->fdwxact = NULL;
+	fdwent->modified = modified;
 	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
 	fdwent->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
@@ -350,8 +397,12 @@ FdwXactRegisterEntry(UserMapping *usermapping)
 	/* Update statistics */
 	if (!ServerSupportTwophaseCommit(fdwent))
 		DistributedXactState.nparticipants_no_twophase++;
+	if (fdwent->modified)
+		DistributedXactState.nparticipants_modified++;
 
 	Assert(DistributedXactState.nparticipants_no_twophase <=
+			   hash_get_num_entries(DistributedXactState.participants));
+	Assert(DistributedXactState.nparticipants_modified <=
 		   hash_get_num_entries(DistributedXactState.participants));
 }
 
@@ -381,9 +432,13 @@ RemoveFdwXactEntry(Oid umid)
 		/* Update statistics */
 		if (!ServerSupportTwophaseCommit(fdwent))
 			DistributedXactState.nparticipants_no_twophase--;
+		if (fdwent->modified)
+			DistributedXactState.nparticipants_modified--;
 
 		Assert(DistributedXactState.nparticipants_no_twophase <=
 			   hash_get_num_entries(DistributedXactState.participants));
+		Assert(DistributedXactState.nparticipants_modified <=
+			   hash_get_num_entries(DistributedXactState.participants));
 	}
 }
 
@@ -454,12 +509,15 @@ AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker)
 	 * transaction after preparing the foreign transactions.  In this case, we
 	 * need to rollback the prepared transaction on the foreign servers.
 	 */
-	if (DistributedXactState.local_prepared && !isCommit)
+	if (DistributedXactState.twophase_commit_required ||
+		(DistributedXactState.local_prepared && !isCommit))
 		FdwXactLaunchResolvers();
 
 	/* Reset all fields */
 	DistributedXactState.local_prepared = false;
+	DistributedXactState.twophase_commit_required = false;
 	DistributedXactState.nparticipants_no_twophase = 0;
+	DistributedXactState.nparticipants_modified = 0;
 	list_free(DistributedXactState.serveroids_uniq);
 	DistributedXactState.serveroids_uniq = NIL;
 }
@@ -533,7 +591,7 @@ AtPrepare_FdwXact(void)
 	 */
 	DistributedXactState.local_prepared = true;
 
-	PrepareAllFdwXacts(xid);
+	PrepareAllFdwXacts(xid, true);
 }
 
 /*
@@ -545,6 +603,8 @@ PreCommit_FdwXact(bool is_parallel_worker)
 {
 	HASH_SEQ_STATUS scan;
 	FdwXactEntry *fdwent;
+	TransactionId xid;
+	bool		local_modified;
 
 	/*
 	 * If there is no foreign server involved or all foreign transactions are
@@ -555,6 +615,41 @@ PreCommit_FdwXact(bool is_parallel_worker)
 
 	Assert(!RecoveryInProgress());
 
+	/*
+	 * Check if the current transaction did writes.	 We need to include the
+	 * local node to the distributed transaction participant and to regard it
+	 * as modified, if the current transaction has performed WAL logging and
+	 * has assigned an xid.	 The transaction can end up not writing any WAL,
+	 * even if it has an xid, if it only wrote to temporary and/or unlogged
+	 * tables.	It can end up having written WAL without an xid if did HOT
+	 * pruning.
+	 */
+	xid = GetTopTransactionIdIfAny();
+	local_modified = (TransactionIdIsValid(xid) && (XactLastRecEnd != 0));
+
+	/*
+	 * Perform twophase commit if required. Note that we don't support foreign
+	 * twophase commit in single user mode.
+	 */
+	if (IsUnderPostmaster && checkForeignTwophaseCommitRequired(local_modified))
+	{
+		/*
+		 * Two-phase commit is required.  Assign a transaction id to the
+		 * current transaction if not yet because the local transaction is
+		 * necessary to determine the result of the distributed transaction.
+		 * Then we prepare foreign transactions on foreign servers that support
+		 * two-phase commit.  Note that we keep FdwXactParticipants until the
+		 * end of the transaction.
+		 */
+		if (!TransactionIdIsValid(xid))
+			xid = GetTopTransactionId();
+
+		DistributedXactState.twophase_commit_required = true;
+		PrepareAllFdwXacts(xid, false);
+
+		return;
+	}
+
 	/* Commit all foreign transactions in the participant list */
 	hash_seq_init(&scan, DistributedXactState.participants);
 	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
@@ -684,6 +779,53 @@ CheckPointFdwXacts(XLogRecPtr redo_horizon)
 							   serialized_fdwxacts)));
 }
 
+/*
+ * Return true if the current transaction modifies data on two or more servers
+ * in FdwXactParticipants and local server itself.
+ */
+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)
+{
+	int		nserverswritten;
+
+	if (!IsForeignTwophaseCommitRequested())
+		return false;
+
+	nserverswritten = DistributedXactState.nparticipants_modified;
+
+	/* Did we modify the local non-temporary data? */
+	if (local_modified)
+		nserverswritten++;
+
+	/*
+	 * Two-phase commit is not required if the number of servers performing
+	 * writes is less than 2.
+	 */
+	if (nserverswritten < 2)
+		return false;
+
+	if (DistributedXactState.nparticipants_no_twophase > 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+				 errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
+
+	/* Two-phase commit is required. Check parameters */
+	if (max_prepared_foreign_xacts == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_prepared_foreign_transactions to a nonzero value.")));
+
+	if (max_foreign_xact_resolvers == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign two-phase commit is required but prepared foreign transactions are disabled"),
+				 errhint("Set max_foreign_transaction_resolvers to a nonzero value.")));
+
+	return true;
+}
+
 /*
  * Prepare all foreign transactions.
  *
@@ -704,9 +846,12 @@ CheckPointFdwXacts(XLogRecPtr redo_horizon)
  * able to resolve it after the server crash.  Hence  persist first then prepare.
  * Point (b) guarantees that foreign transaction information are not lost even
  * if the failover happens.
+ *
+ * If prepare_all is true, we prepare all foreign transaction regardless of
+ * writes having happened on the server.
  */
 static void
-PrepareAllFdwXacts(TransactionId xid)
+PrepareAllFdwXacts(TransactionId xid, bool prepare_all)
 {
 	FdwXactEntry *fdwent;
 	XLogRecPtr	flush_lsn;
@@ -725,6 +870,9 @@ PrepareAllFdwXacts(TransactionId xid)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (!prepare_all && !fdwent->modified)
+			continue;
+
 		/* Get prepared transaction identifier */
 		identifier = getFdwXactIdentifier(fdwent, xid);
 		Assert(identifier);
@@ -1094,7 +1242,7 @@ ForgetAllParticipants(void)
 	Assert(!HasFdwXactParticipant());
 }
 
-static void
+void
 FdwXactLaunchResolvers(void)
 {
 	if (list_length(DistributedXactState.serveroids_uniq) > 0)
@@ -1283,7 +1431,7 @@ FdwXactGetTransactionFate(TransactionId xid)
  *
  * Note: content and len don't include CRC.
  */
-void
+static void
 RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, int len)
 {
 	char		path[MAXPGPATH];
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 65815ec047..aad92816f5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -557,6 +557,24 @@ static const struct config_enum_entry wal_compression_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "required" and "disabled" are documented, we accept all
+ * the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry foreign_twophase_commit_options[] = {
+	{"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false},
+	{"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false},
+	{"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true},
+	{"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -4824,6 +4842,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, assign_synchronous_commit, NULL
 	},
 
+	{
+		{"foreign_twophase_commit", PGC_USERSET, FOREIGN_TRANSACTION,
+		 gettext_noop("Use of foreign twophase commit for the current transaction."),
+			NULL
+		},
+		&foreign_twophase_commit,
+		FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING,
 			gettext_noop("Allows archiving of WAL files using archive_command."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cc35c7109..7619da024b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -757,6 +757,8 @@
 							# retrying to resolve
 							# foreign transactions
 							# after a failed attempt
+#foreign_twophase_commit = disabled	# use two-phase commit for distributed transactions:
+					# disabled or required
 
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 85854864b9..8ac27a0b67 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -22,6 +22,14 @@
 											 * without preparation */
 #define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
 
+/* Enum for foreign_twophase_commit parameter */
+typedef enum
+{
+	FOREIGN_TWOPHASE_COMMIT_DISABLED,	/* disable foreign twophase commit */
+	FOREIGN_TWOPHASE_COMMIT_REQUIRED	/* all foreign servers have to support
+										 * twophase commit */
+}			ForeignTwophaseCommitLevel;
+
 /* Enum to track the status of foreign transaction */
 typedef enum
 {
@@ -100,6 +108,7 @@ extern int	max_prepared_foreign_xacts;
 extern int	max_foreign_xact_resolvers;
 extern int	foreign_xact_resolution_retry_interval;
 extern int	foreign_xact_resolver_timeout;
+extern int	foreign_twophase_commit;
 
 /* Function declarations */
 extern void PreCommit_FdwXact(bool is_parallel_worker);
@@ -107,7 +116,7 @@ extern void AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker);
 extern Size FdwXactShmemSize(void);
 extern void FdwXactShmemInit(void);
 extern void AtPrepare_FdwXact(void);
-extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern void FdwXactLaunchResolvers(void);
 extern int CountFdwXactsForUserMapping(Oid umid);
 extern int CountFdwXactsForDB(Oid dbid);
 extern void FdwXactLaunchResolversForXid(TransactionId xid);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 5338f4f2d9..05c758f869 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -303,7 +303,7 @@ extern bool IsImportableForeignTable(const char *tablename,
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
 /* Functions in transam/fdwxact.c */
-extern void FdwXactRegisterEntry(UserMapping *usermapping);
+extern void FdwXactRegisterEntry(UserMapping *usermapping, bool modified);
 extern void FdwXactUnregisterEntry(UserMapping *usermapping);
 
 #endif							/* FDWAPI_H */
-- 
2.24.3 (Apple Git-128)

v37-0004-postgres_fdw-supports-prepare-API.patchapplication/octet-stream; name=v37-0004-postgres_fdw-supports-prepare-API.patchDownload
From a38558a011805608c37c7bcb7081fe5da41f3dd8 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Sep 2020 17:00:21 +0900
Subject: [PATCH v37 4/9] postgres_fdw supports prepare API.

This commit implements PrepareForeignTransaction API in postgres_fdw,
enabling commit and rollback foreign transactions using by two-phase
commit protocol.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 136 +++++++++++++++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  13 --
 contrib/postgres_fdw/postgres_fdw.c           |   1 +
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   7 -
 5 files changed, 134 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 8ecbbae8f1..f8db97c641 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -108,6 +108,8 @@ static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 static bool UserMappingPasswordRequired(UserMapping *user);
 static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
 static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
+static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+									char *fdwxact_id, bool is_commit);
 static bool disconnect_cached_connections(Oid serverid);
 
 /*
@@ -1467,12 +1469,19 @@ void
 postgresCommitForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry;
+	bool		is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	PGresult   *res;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
 
+	if (!is_onephase)
+	{
+		/* COMMIT PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping, finfo->identifier,
+								true);
+		return;
+	}
+
 	Assert(entry->conn);
 
 	/*
@@ -1514,11 +1523,19 @@ void
 postgresRollbackForeignTransaction(FdwXactInfo *finfo)
 {
 	ConnCacheEntry *entry = NULL;
+	bool is_onephase = (finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0;
 	bool abort_cleanup_failure = false;
 
-	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
-
 	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	if (!is_onephase)
+	{
+		/* ROLLBACK PREPARED the transaction and cleanup */
+		pgfdw_end_prepared_xact(entry, finfo->usermapping, finfo->identifier,
+								false);
+		return;
+	}
+
 	Assert(entry);
 
 	/*
@@ -1588,6 +1605,46 @@ cleanup:
 	pgfdw_cleanup_after_transaction(entry);
 }
 
+/*
+ * Prepare a transaction on foreign server.
+ */
+void
+postgresPrepareForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	PGresult	*res;
+	StringInfo	command;
+
+	/* The transaction should have started already get the cache entry */
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry->conn);
+
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "PREPARE TRANSACTION '%s'", finfo->identifier);
+
+	/* Do prepare foreign transaction */
+	entry->changing_xact_state = true;
+	res = pgfdw_exec_query(entry->conn, command->data, NULL);
+	entry->changing_xact_state = false;
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s",
+							   finfo->server->servername, finfo->identifier)));
+
+	elog(DEBUG1, "prepared foreign transaction on server %s with ID %s",
+		 finfo->server->servername, finfo->identifier);
+
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	pgfdw_cleanup_after_transaction(entry);
+}
+
 /* Cleanup at main-transaction end */
 static void
 pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
@@ -1620,3 +1677,74 @@ pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
 	/* Also reset cursor numbering for next transaction */
 	cursor_number = 0;
 }
+
+/*
+ * Commit or rollback prepared transaction on the foreign server.
+ */
+static void
+pgfdw_end_prepared_xact(ConnCacheEntry *entry, UserMapping *usermapping,
+						char *fdwxact_id, bool is_commit)
+{
+	StringInfo	command;
+	PGresult	*res;
+
+	/*
+	 * Check the connection status for the case the previous attempt
+	 * failed.
+	 */
+	if (entry->conn && PQstatus(entry->conn) != CONNECTION_OK)
+		disconnect_pg_server(entry);
+
+	/*
+	 * In two-phase commit case, since the transaction is about to be
+	 * resolved by a different process than the process who prepared it,
+	 * we might not have a connection yet.
+	 */
+	if (!entry->conn)
+		make_new_connection(entry, usermapping);
+
+	command = makeStringInfo();
+	appendStringInfo(command, "%s PREPARED '%s'",
+					 is_commit ? "COMMIT" : "ROLLBACK",
+					 fdwxact_id);
+
+	/*
+	 * Once the transaction is prepared, further transaction callback is not
+	 * called even when an error occurred during resolving it.  Therefore, we
+	 * don't need to set changing_xact_state here.  On failure the new connection
+	 * will be established either when the new transaction is started or when
+	 * checking the connection status above.
+	 */
+	res = pgfdw_exec_query(entry->conn, command->data, NULL);
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		int		sqlstate;
+		char	*diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (diag_sqlstate)
+		{
+			sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+									 diag_sqlstate[1],
+									 diag_sqlstate[2],
+									 diag_sqlstate[3],
+									 diag_sqlstate[4]);
+		}
+		else
+			sqlstate = ERRCODE_CONNECTION_FAILURE;
+
+		/*
+		 * As core global transaction manager states, it's possible that the
+		 * given foreign transaction doesn't exist on the foreign server. So
+		 * we should accept an UNDEFINED_OBJECT error.
+		 */
+		if (sqlstate != ERRCODE_UNDEFINED_OBJECT)
+			pgfdw_report_error(ERROR, res, entry->conn, false, command->data);
+	}
+
+	elog(DEBUG1, "%s prepared foreign transaction with ID %s",
+		 is_commit ? "commit" : "rollback", fdwxact_id);
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 01c43d80ff..08e86b3b88 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9299,19 +9299,6 @@ DROP OWNED BY regress_nosuper;
 DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
- count 
--------
-   822
-(1 row)
-
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol
-ROLLBACK;
-WARNING:  there is no transaction in progress
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0015fda16a..e1c6bd9330 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -615,6 +615,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for foreign transactions */
 	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
 	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+	routine->PrepareForeignTransaction = postgresPrepareForeignTransaction;
 
 	PG_RETURN_POINTER(routine);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 09d2806618..97e4f244db 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -154,6 +154,7 @@ extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
 extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
 extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
+extern void postgresPrepareForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 286dd99573..441256fc52 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2810,13 +2810,6 @@ DROP ROLE regress_nosuper;
 -- Clean-up
 RESET enable_partitionwise_aggregate;
 
--- Two-phase transactions are not supported.
-BEGIN;
-SELECT count(*) FROM ft1;
--- error here
-PREPARE TRANSACTION 'fdw_tpc';
-ROLLBACK;
-
 -- ===================================================================
 -- reestablish new connection
 -- ===================================================================
-- 
2.24.3 (Apple Git-128)

v37-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchapplication/octet-stream; name=v37-0002-postgres_fdw-supports-commit-and-rollback-APIs.patchDownload
From 46255b2377d72f2905c8bc669c015f2e011f2fc7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 10 May 2021 20:31:30 +0900
Subject: [PATCH v37 2/9] postgres_fdw supports commit and rollback APIs.

This commit implements both CommitForeignTransaction and
RollbackForeignTransaction APIs in postgres_fdw. Note that since
PREPARE TRANSACTION is still not supported this commit doesn't change
anything user newly is able to do.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 contrib/postgres_fdw/connection.c             | 459 +++++++++---------
 .../postgres_fdw/expected/postgres_fdw.out    |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |   4 +
 contrib/postgres_fdw/postgres_fdw.h           |   3 +
 4 files changed, 225 insertions(+), 243 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 82aa14a65d..8ecbbae8f1 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
 #include "funcapi.h"
+#include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,8 +93,7 @@ static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
 static void disconnect_pg_server(ConnCacheEntry *entry);
 static void check_conn_params(const char **keywords, const char **values, UserMapping *user);
 static void configure_remote_session(PGconn *conn);
-static void begin_remote_xact(ConnCacheEntry *entry);
-static void pgfdw_xact_callback(XactEvent event, void *arg);
+static void begin_remote_xact(ConnCacheEntry *entry, UserMapping *user);
 static void pgfdw_subxact_callback(SubXactEvent event,
 								   SubTransactionId mySubid,
 								   SubTransactionId parentSubid,
@@ -106,6 +106,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 									 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static ConnCacheEntry *GetConnectionCacheEntry(Oid umid);
+static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry);
 static bool disconnect_cached_connections(Oid serverid);
 
 /*
@@ -124,53 +126,14 @@ static bool disconnect_cached_connections(Oid serverid);
 PGconn *
 GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 {
-	bool		found;
 	bool		retry = false;
 	ConnCacheEntry *entry;
-	ConnCacheKey key;
 	MemoryContext ccxt = CurrentMemoryContext;
 
-	/* First time through, initialize connection cache hashtable */
-	if (ConnectionHash == NULL)
-	{
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(ConnCacheKey);
-		ctl.entrysize = sizeof(ConnCacheEntry);
-		ConnectionHash = hash_create("postgres_fdw connections", 8,
-									 &ctl,
-									 HASH_ELEM | HASH_BLOBS);
-
-		/*
-		 * Register some callback functions that manage connection cleanup.
-		 * This should be done just once in each backend.
-		 */
-		RegisterXactCallback(pgfdw_xact_callback, NULL);
-		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
-		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
-									  pgfdw_inval_callback, (Datum) 0);
-		CacheRegisterSyscacheCallback(USERMAPPINGOID,
-									  pgfdw_inval_callback, (Datum) 0);
-	}
-
 	/* Set flag that we did GetConnection during the current transaction */
 	xact_got_connection = true;
 
-	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
-
-	/*
-	 * Find or create cached entry for requested connection.
-	 */
-	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
-	if (!found)
-	{
-		/*
-		 * We need only clear "conn" here; remaining fields will be filled
-		 * later when "conn" is set.
-		 */
-		entry->conn = NULL;
-	}
+	entry = GetConnectionCacheEntry(user->umid);
 
 	/* Reject further use of connections which failed abort cleanup. */
 	pgfdw_reject_incomplete_xact_state_change(entry);
@@ -205,7 +168,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 		if (entry->state.pendingAreq)
 			process_pending_request(entry->state.pendingAreq);
 		/* Start a new transaction or subtransaction if needed. */
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 	PG_CATCH();
 	{
@@ -266,7 +229,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 		if (entry->conn == NULL)
 			make_new_connection(entry, user);
 
-		begin_remote_xact(entry);
+		begin_remote_xact(entry, user);
 	}
 
 	/* Remember if caller will prepare statements */
@@ -279,6 +242,54 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 	return entry->conn;
 }
 
+/* Return ConnCacheEntry identified by the given umid */
+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+	bool		found;
+	ConnCacheEntry *entry;
+	ConnCacheKey key;
+
+	/* First time through, initialize connection cache hashtable */
+	if (ConnectionHash == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(ConnCacheKey);
+		ctl.entrysize = sizeof(ConnCacheEntry);
+		ConnectionHash = hash_create("postgres_fdw connections", 8,
+									 &ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+		/*
+		 * Register some callback functions that manage connection cleanup.
+		 * This should be done just once in each backend.
+		 */
+		RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
+		CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
+									  pgfdw_inval_callback, (Datum) 0);
+		CacheRegisterSyscacheCallback(USERMAPPINGOID,
+									  pgfdw_inval_callback, (Datum) 0);
+	}
+
+	/* Create hash key for the entry.  Assume no pad bytes in key struct */
+	key = umid;
+
+	/*
+	 * Find or create cached entry for requested connection.
+	 */
+	entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+	if (!found)
+	{
+		/*
+		 * We need only clear "conn" here; remaining fields will be filled
+		 * later when "conn" is set.
+		 */
+		entry->conn = NULL;
+	}
+	return entry;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
@@ -591,7 +602,7 @@ do_sql_command(PGconn *conn, const char *sql)
  * control which remote queries share a snapshot.
  */
 static void
-begin_remote_xact(ConnCacheEntry *entry)
+begin_remote_xact(ConnCacheEntry *entry, UserMapping *user)
 {
 	int			curlevel = GetCurrentTransactionNestLevel();
 
@@ -603,6 +614,9 @@ begin_remote_xact(ConnCacheEntry *entry)
 		elog(DEBUG3, "starting remote transaction on connection %p",
 			 entry->conn);
 
+		/* Register the foreign server to the transaction */
+		FdwXactRegisterEntry(user);
+
 		if (IsolationIsSerializable())
 			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
 		else
@@ -822,203 +836,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 	PG_END_TRY();
 }
 
-/*
- * pgfdw_xact_callback --- cleanup at main-transaction end.
- *
- * This runs just late enough that it must not enter user-defined code
- * locally.  (Entering such code on the remote side is fine.  Its remote
- * COMMIT TRANSACTION may run deferred triggers.)
- */
-static void
-pgfdw_xact_callback(XactEvent event, void *arg)
-{
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-
-	/* Quick exit if no connections were touched in this transaction. */
-	if (!xact_got_connection)
-		return;
-
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(&scan, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
-	{
-		PGresult   *res;
-
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			bool		abort_cleanup_failure = false;
-
-			elog(DEBUG3, "closing remote transaction on connection %p",
-				 entry->conn);
-
-			switch (event)
-			{
-				case XACT_EVENT_PARALLEL_PRE_COMMIT:
-				case XACT_EVENT_PRE_COMMIT:
-
-					/*
-					 * If abort cleanup previously failed for this connection,
-					 * we can't issue any more commands against it.
-					 */
-					pgfdw_reject_incomplete_xact_state_change(entry);
-
-					/* Commit all remote transactions during pre-commit */
-					entry->changing_xact_state = true;
-					do_sql_command(entry->conn, "COMMIT TRANSACTION");
-					entry->changing_xact_state = false;
-
-					/*
-					 * If there were any errors in subtransactions, and we
-					 * made prepared statements, do a DEALLOCATE ALL to make
-					 * sure we get rid of all prepared statements. This is
-					 * annoying and not terribly bulletproof, but it's
-					 * probably not worth trying harder.
-					 *
-					 * DEALLOCATE ALL only exists in 8.3 and later, so this
-					 * constrains how old a server postgres_fdw can
-					 * communicate with.  We intentionally ignore errors in
-					 * the DEALLOCATE, so that we can hobble along to some
-					 * extent with older servers (leaking prepared statements
-					 * as we go; but we don't really support update operations
-					 * pre-8.3 anyway).
-					 */
-					if (entry->have_prep_stmt && entry->have_error)
-					{
-						res = PQexec(entry->conn, "DEALLOCATE ALL");
-						PQclear(res);
-					}
-					entry->have_prep_stmt = false;
-					entry->have_error = false;
-					break;
-				case XACT_EVENT_PRE_PREPARE:
-
-					/*
-					 * We disallow any remote transactions, since it's not
-					 * very reasonable to hold them open until the prepared
-					 * transaction is committed.  For the moment, throw error
-					 * unconditionally; later we might allow read-only cases.
-					 * Note that the error will cause us to come right back
-					 * here with event == XACT_EVENT_ABORT, so we'll clean up
-					 * the connection state at that point.
-					 */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables")));
-					break;
-				case XACT_EVENT_PARALLEL_COMMIT:
-				case XACT_EVENT_COMMIT:
-				case XACT_EVENT_PREPARE:
-					/* Pre-commit should have closed the open transaction */
-					elog(ERROR, "missed cleaning up connection during pre-commit");
-					break;
-				case XACT_EVENT_PARALLEL_ABORT:
-				case XACT_EVENT_ABORT:
-
-					/*
-					 * Don't try to clean up the connection if we're already
-					 * in error recursion trouble.
-					 */
-					if (in_error_recursion_trouble())
-						entry->changing_xact_state = true;
-
-					/*
-					 * If connection is already unsalvageable, don't touch it
-					 * further.
-					 */
-					if (entry->changing_xact_state)
-						break;
-
-					/*
-					 * Mark this connection as in the process of changing
-					 * transaction state.
-					 */
-					entry->changing_xact_state = true;
-
-					/* Assume we might have lost track of prepared statements */
-					entry->have_error = true;
-
-					/*
-					 * If a command has been submitted to the remote server by
-					 * using an asynchronous execution function, the command
-					 * might not have yet completed.  Check to see if a
-					 * command is still being processed by the remote server,
-					 * and if so, request cancellation of the command.
-					 */
-					if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
-						!pgfdw_cancel_query(entry->conn))
-					{
-						/* Unable to cancel running query. */
-						abort_cleanup_failure = true;
-					}
-					else if (!pgfdw_exec_cleanup_query(entry->conn,
-													   "ABORT TRANSACTION",
-													   false))
-					{
-						/* Unable to abort remote transaction. */
-						abort_cleanup_failure = true;
-					}
-					else if (entry->have_prep_stmt && entry->have_error &&
-							 !pgfdw_exec_cleanup_query(entry->conn,
-													   "DEALLOCATE ALL",
-													   true))
-					{
-						/* Trouble clearing prepared statements. */
-						abort_cleanup_failure = true;
-					}
-					else
-					{
-						entry->have_prep_stmt = false;
-						entry->have_error = false;
-						/* Also reset per-connection state */
-						memset(&entry->state, 0, sizeof(entry->state));
-					}
-
-					/* Disarm changing_xact_state if it all worked. */
-					entry->changing_xact_state = abort_cleanup_failure;
-					break;
-			}
-		}
-
-		/* Reset state to show we're out of a transaction */
-		entry->xact_depth = 0;
-
-		/*
-		 * If the connection isn't in a good idle state, it is marked as
-		 * invalid or keep_connections option of its server is disabled, then
-		 * discard it to recover. Next GetConnection will open a new
-		 * connection.
-		 */
-		if (PQstatus(entry->conn) != CONNECTION_OK ||
-			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state ||
-			entry->invalidated ||
-			!entry->keep_connections)
-		{
-			elog(DEBUG3, "discarding connection %p", entry->conn);
-			disconnect_pg_server(entry);
-		}
-	}
-
-	/*
-	 * Regardless of the event type, we can now mark ourselves as out of the
-	 * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
-	 * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
-	 */
-	xact_got_connection = false;
-
-	/* Also reset cursor numbering for next transaction */
-	cursor_number = 0;
-}
-
 /*
  * pgfdw_subxact_callback --- cleanup at subtransaction end.
  */
@@ -1645,3 +1462,161 @@ disconnect_cached_connections(Oid serverid)
 
 	return result;
 }
+
+void
+postgresCommitForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry;
+	PGresult   *res;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+
+	Assert(entry->conn);
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	do_sql_command(entry->conn, "COMMIT TRANSACTION");
+	entry->changing_xact_state = false;
+
+	/*
+	 * If there were any errors in subtransactions, and we ma
+	 * made prepared statements, do a DEALLOCATE ALL to make
+	 * sure we get rid of all prepared statements. This is
+	 * annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this
+	 * constrains how old a server postgres_fdw can
+	 * communicate with.  We intentionally ignore errors in
+	 * the DEALLOCATE, so that we can hobble along to some
+	 * extent with older servers (leaking prepared statements
+	 * as we go; but we don't really support update operations
+	 * pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+void
+postgresRollbackForeignTransaction(FdwXactInfo *finfo)
+{
+	ConnCacheEntry *entry = NULL;
+	bool abort_cleanup_failure = false;
+
+	Assert((finfo->flags & FDWXACT_FLAG_ONEPHASE) != 0);
+
+	entry = GetConnectionCacheEntry(finfo->usermapping->umid);
+	Assert(entry);
+
+	/*
+	 * Cleanup connection entry transaction if transaction fails before
+	 * establishing a connection.
+	 */
+	if (!entry->conn)
+		goto cleanup;
+
+	/*
+	 * Don't try to clean up the connection if we're already
+	 * in error recursion trouble.
+	 */
+	if (in_error_recursion_trouble())
+		entry->changing_xact_state = true;
+
+	/*
+	 * If connection is before starting transaction or is already unsalvageable,
+	 * do only the cleanup and don't touch it further.
+	 */
+	if (entry->changing_xact_state)
+		goto cleanup;
+
+	/*
+	 * Mark this connection as in the process of changing
+	 * transaction state.
+	 */
+	entry->changing_xact_state = true;
+
+	/* Assume we might have lost track of prepared statements */
+	entry->have_error = true;
+
+	/*
+	 * If a command has been submitted to the remote server by
+	 * using an asynchronous execution function, the command
+	 * might not have yet completed.  Check to see if a
+	 * command is still being processed by the remote server,
+	 * and if so, request cancellation of the command.
+	 */
+	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
+		!pgfdw_cancel_query(entry->conn))
+	{
+		/* Unable to cancel running query. */
+		abort_cleanup_failure = true;
+	}
+	else if (!pgfdw_exec_cleanup_query(entry->conn,
+									   "ABORT TRANSACTION",
+									   false))
+	{
+		/* Unable to abort remote transaction. */
+		abort_cleanup_failure = true;
+	}
+	else if (entry->have_prep_stmt && entry->have_error &&
+			 !pgfdw_exec_cleanup_query(entry->conn,
+									   "DEALLOCATE ALL",
+									   true))
+	{
+		/* Trouble clearing prepared statements. */
+		abort_cleanup_failure = true;
+	}
+
+	/* Disarm changing_xact_state if it all worked. */
+	entry->changing_xact_state = abort_cleanup_failure;
+
+cleanup:
+	/* Cleanup transaction status */
+	pgfdw_cleanup_after_transaction(entry);
+}
+
+/* Cleanup at main-transaction end */
+static void
+pgfdw_cleanup_after_transaction(ConnCacheEntry *entry)
+{
+	/* Reset state to show we're out of a transaction */
+	entry->xact_depth = 0;
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	/*
+	 * If the connection isn't in a good idle state, discard it to
+	 * recover. Next GetConnection will open a new connection.
+	 */
+	if (PQstatus(entry->conn) != CONNECTION_OK ||
+		PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+		entry->changing_xact_state ||
+		entry->invalidated ||
+		!entry->keep_connections)
+	{
+		elog(DEBUG3, "discarding connection %p", entry->conn);
+		disconnect_pg_server(entry);
+	}
+
+	/*
+	 * Regardless of the event type, we can now mark ourselves as out of the
+	 * transaction.
+	 */
+   xact_got_connection = false;
+
+	/* Also reset cursor numbering for next transaction */
+	cursor_number = 0;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 31b5de91ad..343b33473b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9309,7 +9309,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
+ERROR:  cannot PREPARE a transaction that has operated on foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fafbab6b02..0015fda16a 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -612,6 +612,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->ForeignAsyncConfigureWait = postgresForeignAsyncConfigureWait;
 	routine->ForeignAsyncNotify = postgresForeignAsyncNotify;
 
+	/* Support functions for foreign transactions */
+	routine->CommitForeignTransaction = postgresCommitForeignTransaction;
+	routine->RollbackForeignTransaction = postgresRollbackForeignTransaction;
+
 	PG_RETURN_POINTER(routine);
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 9591c0f6c2..09d2806618 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -13,6 +13,7 @@
 #ifndef POSTGRES_FDW_H
 #define POSTGRES_FDW_H
 
+#include "access/fdwxact.h"
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
@@ -151,6 +152,8 @@ extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query,
 								  PgFdwConnState *state);
 extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 							   bool clear, const char *sql);
+extern void postgresCommitForeignTransaction(FdwXactInfo *finfo);
+extern void postgresRollbackForeignTransaction(FdwXactInfo *finfo);
 
 /* in option.c */
 extern int	ExtractConnectionOptions(List *defelems,
-- 
2.24.3 (Apple Git-128)

v37-0001-Introduce-transaction-manager-for-foreign-transa.patchapplication/octet-stream; name=v37-0001-Introduce-transaction-manager-for-foreign-transa.patchDownload
From e1b1bbde5874b21f4cae4cb4be90a7f2001b5033 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 28 Aug 2020 22:25:38 +0900
Subject: [PATCH v37 1/9] Introduce transaction manager for foreign
 transactions.

The global transaciton manager manages the transactions initiated on
the foreign server. This commit also adds both
CommitForeignTransaction and RollbackForeignTransaction FDW APIs
supporing only one-phase commit. FDW that implements these APIs can be
managed by the global transaciton manager. So FDW is able to control
its transaction using the foreign transaction manager, not using
XactCallback.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 src/backend/access/transam/Makefile  |   1 +
 src/backend/access/transam/fdwxact.c | 223 +++++++++++++++++++++++++++
 src/backend/access/transam/xact.c    |   8 +
 src/backend/foreign/foreign.c        |   4 +
 src/include/access/fdwxact.h         |  34 ++++
 src/include/foreign/fdwapi.h         |  13 ++
 6 files changed, 283 insertions(+)
 create mode 100644 src/backend/access/transam/fdwxact.c
 create mode 100644 src/include/access/fdwxact.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 595e02de72..b05a88549d 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	clog.o \
 	commit_ts.o \
+	fdwxact.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
new file mode 100644
index 0000000000..ae3fdbdf83
--- /dev/null
+++ b/src/backend/access/transam/fdwxact.c
@@ -0,0 +1,223 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact.c
+ *		PostgreSQL global transaction manager for foreign servers.
+ *
+ * This module contains the code for managing transactions started on foreign
+ * servers.
+ *
+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction participant by FdwXactRegisterEntry() to participate
+ * it to a group of distributed tranasction.  The registered foreign transactions
+ * are identified by user mapping OID.  On commit and rollback, the global
+ * transaction manager calls corresponding FDW API to end the foreign
+ * tranasctions.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/pg_user_mapping.h"
+#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+/* Initial size of the hash table */
+#define FDWXACT_HASH_SIZE	64
+
+/* Check the FdwXactEntry supports commit (and rollback) callbacks */
+#define ServerSupportTransactionCallback(fdwent) \
+	(((FdwXactEntry *)(fdwent))->commit_foreign_xact_fn != NULL)
+
+/*
+ * Structure to bundle the foreign transaction participant.
+ *
+ * Participants are identified by user mapping OID, rather than pair of
+ * user OID and server OID. See README.fdwxact for the discussion.
+ */
+typedef struct FdwXactEntry
+{
+	/* user mapping OID, hash key (must be first) */
+	Oid			umid;
+
+	ForeignServer *server;
+	UserMapping *usermapping;
+
+	/* Callbacks for foreign transaction */
+	CommitForeignTransaction_function commit_foreign_xact_fn;
+	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+} FdwXactEntry;
+
+/*
+ * Foreign transaction participants involved in the current transaction.
+ * A member of participants must support both commit and rollback APIs
+ * (i.g., ServerSupportTransactionCallback() is true).
+ */
+static HTAB *FdwXactParticipants = NULL;
+
+/* Check the current transaction has at least one fdwxact participant */
+#define HasFdwXactParticipant() \
+	(FdwXactParticipants != NULL && \
+	 hash_get_num_entries(FdwXactParticipants) > 0)
+
+static void EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit,
+							bool is_parallel_worker);
+static void RemoveFdwXactEntry(Oid umid);
+
+/*
+ * Register the given foreign transaction participant identified by the
+ * given user mapping OID as a participant of the transaction.
+ */
+void
+FdwXactRegisterEntry(UserMapping *usermapping)
+{
+	FdwXactEntry *fdwent;
+	FdwRoutine *routine;
+	Oid			umid;
+	MemoryContext old_ctx;
+	bool		found;
+
+	Assert(IsTransactionState());
+
+	if (FdwXactParticipants == NULL)
+	{
+		HASHCTL		ctl;
+
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(FdwXactEntry);
+
+		FdwXactParticipants = hash_create("fdw xact participants",
+										  FDWXACT_HASH_SIZE,
+										  &ctl, HASH_ELEM | HASH_BLOBS);
+	}
+
+	umid = usermapping->umid;
+	fdwent = hash_search(FdwXactParticipants, (void *) &umid, HASH_ENTER, &found);
+
+	if (found)
+		return;
+
+	/*
+	 * The participant information needs to live until the end of the
+	 * transaction where syscache is not available, so we save them in
+	 * TopTransactionContext.
+	 */
+	old_ctx = MemoryContextSwitchTo(TopTransactionContext);
+
+	fdwent->usermapping = GetUserMapping(usermapping->userid, usermapping->serverid);
+	fdwent->server = GetForeignServer(usermapping->serverid);
+
+	/*
+	 * Foreign server managed by the transaction manager must implement
+	 * transaction callbacks.
+	 */
+	routine = GetFdwRoutineByServerId(usermapping->serverid);
+	if (!routine->CommitForeignTransaction)
+		ereport(ERROR,
+				(errmsg("cannot register foreign server not supporting transaction callback")));
+
+	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
+	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+
+	MemoryContextSwitchTo(old_ctx);
+}
+
+/* Remove the foreign transaction from FdwXactParticipants */
+void
+FdwXactUnregisterEntry(UserMapping *usermapping)
+{
+	Assert(IsTransactionState());
+	RemoveFdwXactEntry(usermapping->umid);
+}
+
+/*
+ * Remove an FdwXactEntry identified by the given user mapping id from the
+ * hash table.
+ */
+static void
+RemoveFdwXactEntry(Oid umid)
+{
+	(void) hash_search(FdwXactParticipants, (void *) &umid, HASH_REMOVE, NULL);
+}
+
+/*
+ * Commit or rollback all foreign transactions.
+ */
+void
+AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker)
+{
+	FdwXactEntry *fdwent;
+	HASH_SEQ_STATUS scan;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (!HasFdwXactParticipant())
+		return;
+
+	hash_seq_init(&scan, FdwXactParticipants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		Assert(ServerSupportTransactionCallback(fdwent));
+
+		/* Commit or rollback foreign transaction */
+		EndFdwXactEntry(fdwent, isCommit, is_parallel_worker);
+
+		/*
+		 * Remove the entry so that we don't recursively process this foreign
+		 * transaction.
+		 */
+		RemoveFdwXactEntry(fdwent->umid);
+	}
+
+	Assert(!HasFdwXactParticipant());
+}
+
+/*
+ * The routine for committing or rolling back the given transaction participant.
+ */
+static void
+EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit, bool is_parallel_worker)
+{
+	FdwXactInfo finfo;
+
+	Assert(ServerSupportTransactionCallback(fdwent));
+
+	finfo.server = fdwent->server;
+	finfo.usermapping = fdwent->usermapping;
+	finfo.flags = FDWXACT_FLAG_ONEPHASE |
+		((is_parallel_worker) ? FDWXACT_FLAG_PARALLEL_WORKER : 0);
+
+	if (isCommit)
+	{
+		fdwent->commit_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully committed the foreign transaction for user mapping %u",
+			 fdwent->umid);
+	}
+	else
+	{
+		fdwent->rollback_foreign_xact_fn(&finfo);
+		elog(DEBUG1, "successfully rolled back the foreign transaction for user mapping %u",
+			 fdwent->umid);
+	}
+}
+
+/*
+ * This function is called at PREPARE TRANSACTION.  Since we don't support
+ * preparing foreign transactions for now, raise an error if the local transaction
+ * has any foreign transaction.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+	if (HasFdwXactParticipant())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..1e00a3a98e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -21,6 +21,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
@@ -2129,6 +2130,9 @@ CommitTransaction(void)
 	if (IsInParallelMode())
 		AtEOXact_Parallel(true);
 
+	/* Call foreign transaction callbacks at pre-commit phase, if any */
+	AtEOXact_FdwXact(true, is_parallel_worker);
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2369,6 +2373,9 @@ PrepareTransaction(void)
 	 * the transaction-abort path.
 	 */
 
+	/* Process foreign trasactions */
+	AtPrepare_FdwXact();
+
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
 
@@ -2705,6 +2712,7 @@ AbortTransaction(void)
 	AtAbort_Notify();
 	AtEOXact_RelationMap(false, is_parallel_worker);
 	AtAbort_Twophase();
+	AtEOXact_FdwXact(false, is_parallel_worker);
 
 	/*
 	 * Advertise the fact that we aborted in pg_xact (assuming that we got as
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 5564dc3a1e..f8eb4fa215 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -328,6 +328,10 @@ GetFdwRoutine(Oid fdwhandler)
 		elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct",
 			 fdwhandler);
 
+	/* The FDW must support both or nothing */
+	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
+		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
+
 	return routine;
 }
 
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
new file mode 100644
index 0000000000..1d4a285c75
--- /dev/null
+++ b/src/include/access/fdwxact.h
@@ -0,0 +1,34 @@
+/*
+ * fdwxact.h
+ *
+ * PostgreSQL global transaction manager
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact.h
+ */
+#ifndef FDWXACT_H
+#define FDWXACT_H
+
+#include "access/xact.h"
+#include "foreign/foreign.h"
+
+/* Flag passed to FDW transaction management APIs */
+#define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
+											 * without preparation */
+#define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
+
+/* State data for foreign transaction resolution, passed to FDW callbacks */
+typedef struct FdwXactInfo
+{
+	ForeignServer	*server;
+	UserMapping		*usermapping;
+
+	int	flags;			/* OR of FDWXACT_FLAG_xx flags */
+} FdwXactInfo;
+
+/* Function declarations */
+extern void AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker);
+extern void AtPrepare_FdwXact(void);
+
+#endif /* FDWXACT_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index a801cd3057..c3539a4d73 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -13,6 +13,7 @@
 #define FDWAPI_H
 
 #include "access/parallel.h"
+#include "access/fdwxact.h"
 #include "nodes/execnodes.h"
 #include "nodes/pathnodes.h"
 
@@ -191,6 +192,10 @@ typedef void (*ForeignAsyncConfigureWait_function) (AsyncRequest *areq);
 
 typedef void (*ForeignAsyncNotify_function) (AsyncRequest *areq);
 
+typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
+typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
+
+
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
  * function.  It provides pointers to the callback functions needed by the
@@ -278,6 +283,10 @@ typedef struct FdwRoutine
 	ForeignAsyncRequest_function ForeignAsyncRequest;
 	ForeignAsyncConfigureWait_function ForeignAsyncConfigureWait;
 	ForeignAsyncNotify_function ForeignAsyncNotify;
+
+	/* Support functions for transaction management */
+	CommitForeignTransaction_function CommitForeignTransaction;
+	RollbackForeignTransaction_function RollbackForeignTransaction;
 } FdwRoutine;
 
 
@@ -291,4 +300,8 @@ extern bool IsImportableForeignTable(const char *tablename,
 									 ImportForeignSchemaStmt *stmt);
 extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
 
+/* Functions in transam/fdwxact.c */
+extern void FdwXactRegisterEntry(UserMapping *usermapping);
+extern void FdwXactUnregisterEntry(UserMapping *usermapping);
+
 #endif							/* FDWAPI_H */
-- 
2.24.3 (Apple Git-128)

v37-0003-Support-two-phase-commit-for-foreign-transaction.patchapplication/octet-stream; name=v37-0003-Support-two-phase-commit-for-foreign-transaction.patchDownload
From cc438a3fb50835d29ea2995e0372a70051be452d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 10 May 2021 20:32:25 +0900
Subject: [PATCH v37 3/9] Support two-phase commit for foreign transactions.

Co-authored-by: Masahiko Sawada, Ashutosh Bapat
---
 .../postgres_fdw/expected/postgres_fdw.out    |    2 +-
 src/backend/access/rmgrdesc/Makefile          |    1 +
 src/backend/access/rmgrdesc/fdwxactdesc.c     |   59 +
 src/backend/access/rmgrdesc/xlogdesc.c        |    4 +-
 src/backend/access/transam/Makefile           |    2 +
 src/backend/access/transam/fdwxact.c          | 1945 ++++++++++++++++-
 src/backend/access/transam/fdwxact_launcher.c |  587 +++++
 src/backend/access/transam/fdwxact_resolver.c |  339 +++
 src/backend/access/transam/rmgr.c             |    1 +
 src/backend/access/transam/twophase.c         |   35 +
 src/backend/access/transam/xact.c             |    4 +-
 src/backend/access/transam/xlog.c             |   41 +-
 src/backend/catalog/dependency.c              |    5 +-
 src/backend/catalog/system_views.sql          |    3 +
 src/backend/commands/dbcommands.c             |   14 +
 src/backend/commands/foreigncmds.c            |   78 +-
 src/backend/foreign/foreign.c                 |    6 +
 src/backend/nodes/copyfuncs.c                 |   15 +
 src/backend/nodes/equalfuncs.c                |   14 +
 src/backend/parser/gram.y                     |   29 +-
 src/backend/postmaster/bgworker.c             |    8 +
 src/backend/postmaster/postmaster.c           |   13 +-
 src/backend/replication/logical/decode.c      |    1 +
 src/backend/replication/syncrep.c             |   14 +-
 src/backend/storage/ipc/ipci.c                |    6 +
 src/backend/storage/ipc/procarray.c           |   42 +-
 src/backend/storage/lmgr/lwlocknames.txt      |    2 +
 src/backend/tcop/postgres.c                   |   16 +
 src/backend/tcop/utility.c                    |   12 +-
 src/backend/utils/activity/wait_event.c       |   15 +
 src/backend/utils/misc/guc.c                  |   48 +
 src/backend/utils/misc/postgresql.conf.sample |   14 +
 src/bin/initdb/initdb.c                       |    1 +
 src/bin/pg_controldata/pg_controldata.c       |    2 +
 src/bin/pg_resetwal/pg_resetwal.c             |    2 +
 src/bin/pg_waldump/rmgrdesc.c                 |    1 +
 src/include/access/fdwxact.h                  |   88 +-
 src/include/access/fdwxact_launcher.h         |   29 +
 src/include/access/fdwxact_resolver.h         |   22 +
 src/include/access/fdwxact_xlog.h             |   49 +
 src/include/access/resolver_internal.h        |   59 +
 src/include/access/rmgrlist.h                 |    1 +
 src/include/access/twophase.h                 |    1 +
 src/include/access/xlog_internal.h            |    1 +
 src/include/catalog/pg_control.h              |    1 +
 src/include/catalog/pg_proc.dat               |   23 +
 src/include/commands/defrem.h                 |    2 +
 src/include/foreign/fdwapi.h                  |    2 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/parsenodes.h                |   10 +-
 src/include/replication/syncrep.h             |    2 +-
 src/include/storage/procarray.h               |    1 +
 src/include/utils/guc_tables.h                |    2 +
 src/include/utils/wait_event.h                |    7 +-
 src/test/regress/expected/rules.out           |    8 +
 55 files changed, 3623 insertions(+), 67 deletions(-)
 create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c
 create mode 100644 src/backend/access/transam/fdwxact_launcher.c
 create mode 100644 src/backend/access/transam/fdwxact_resolver.c
 create mode 100644 src/include/access/fdwxact_launcher.h
 create mode 100644 src/include/access/fdwxact_resolver.h
 create mode 100644 src/include/access/fdwxact_xlog.h
 create mode 100644 src/include/access/resolver_internal.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 343b33473b..01c43d80ff 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9309,7 +9309,7 @@ SELECT count(*) FROM ft1;
 
 -- error here
 PREPARE TRANSACTION 'fdw_tpc';
-ERROR:  cannot PREPARE a transaction that has operated on foreign tables
+ERROR:  cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol
 ROLLBACK;
 WARNING:  there is no transaction in progress
 -- ===================================================================
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd86..982c1a36cc 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -13,6 +13,7 @@ OBJS = \
 	clogdesc.o \
 	committsdesc.o \
 	dbasedesc.o \
+	fdwxactdesc.o \
 	genericdesc.o \
 	gindesc.o \
 	gistdesc.o \
diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c
new file mode 100644
index 0000000000..4e97486640
--- /dev/null
+++ b/src/backend/access/rmgrdesc/fdwxactdesc.c
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxactdesc.c
+ *		rmgr descriptor routines for access/transam/fdwxact.c
+ *
+ * This module describes the WAL records for foreign transaction manager.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/backend/access/rmgrdesc/fdwxactdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/fdwxact_xlog.h"
+
+void
+fdwxact_desc(StringInfo buf, XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		FdwXactStateOnDiskData *fdwxact_insert = (FdwXactStateOnDiskData *) rec;
+
+		appendStringInfo(buf, "xid: %u, dbid: %u, umid: %u, serverid: %u, owner: %u, identifier: %s",
+						 fdwxact_insert->xid,
+						 fdwxact_insert->dbid,
+						 fdwxact_insert->umid,
+						 fdwxact_insert->serverid,
+						 fdwxact_insert->owner,
+						 fdwxact_insert->identifier);
+	}
+	else
+	{
+		xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec;
+
+		appendStringInfo(buf, "xid: %u, umid: %u",
+						 fdwxact_remove->xid,
+						 fdwxact_remove->umid);
+	}
+
+}
+
+const char *
+fdwxact_identify(uint8 info)
+{
+	switch (info & ~XLR_INFO_MASK)
+	{
+		case XLOG_FDWXACT_INSERT:
+			return "NEW FOREIGN TRANSACTION";
+		case XLOG_FDWXACT_REMOVE:
+			return "REMOVE FOREIGN TRANSACTION";
+	}
+	/* Keep compiler happy */
+	return NULL;
+}
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index e6090a9dad..6bd3bb7700 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -112,12 +112,14 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 
 		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
 						 "max_wal_senders=%d max_prepared_xacts=%d "
+						 "max_prepared_foreign_transactions=%d"
 						 "max_locks_per_xact=%d wal_level=%s "
-						 "wal_log_hints=%s track_commit_timestamp=%s",
+						 "wal_log_hints=%s track_commit_timestamp=%s ",
 						 xlrec.MaxConnections,
 						 xlrec.max_worker_processes,
 						 xlrec.max_wal_senders,
 						 xlrec.max_prepared_xacts,
+						 xlrec.max_prepared_foreign_xacts,
 						 xlrec.max_locks_per_xact,
 						 wal_level_str,
 						 xlrec.wal_log_hints ? "on" : "off",
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index b05a88549d..26a5ee589c 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -16,6 +16,8 @@ OBJS = \
 	clog.o \
 	commit_ts.o \
 	fdwxact.o \
+	fdwxact_launcher.o \
+	fdwxact_resolver.o \
 	generic_xlog.o \
 	multixact.o \
 	parallel.o \
diff --git a/src/backend/access/transam/fdwxact.c b/src/backend/access/transam/fdwxact.c
index ae3fdbdf83..0c6e80a6de 100644
--- a/src/backend/access/transam/fdwxact.c
+++ b/src/backend/access/transam/fdwxact.c
@@ -13,6 +13,57 @@
  * transaction manager calls corresponding FDW API to end the foreign
  * tranasctions.
  *
+ * To achieve commit among all foreign servers atomically, the global transaction
+ * manager supports two-phase commit protocol, which is a type of atomic commitment
+ * protocol. We WAL log the foreign transaction state so foreign transaction state
+ * is crash-safe.
+ *
+ * FOREIGN TRANSACTION RESOLUTION
+ *
+ * At PREPARE TRANSACTION, we prepare all transactions on foreign servers by executing
+ * PrepareForeignTransaction() API for each foreign transaction regardless of data on
+ * the foreign server having been modified.  At COMMIT PREPARED and ROLLBACK PREPARED,
+ * we commit or rollback only the local transaction but not do anything for involved
+ * foreign transactions.  The prepared foreign transactinos are resolved by a resolver
+ * process asynchronously.  Also, users can use pg_resolve_foreign_xact() SQL function
+ * that resolve a foreign transaction manually.
+ *
+ * LOCKING
+ *
+ * Whenever a foreign transaction is processed, the corresponding FdwXactState
+ * entry is updated. To avoid holding the lock during transaction processing
+ * which may take an unpredictable time the in-memory data of foreign
+ * transaction follows a locking model based on the following linked concepts:
+ *
+ * * A process who is going to work on the foreign transaction needs to set
+ *	 locking_backend of the FdwXactState entry, which prevents the entry from being
+ *	 updated and removed by concurrent processes.
+ * * All FdwXactState fields except for status are protected by FdwXactLock.  The
+ *   status is protected by its mutex.
+ *
+ * RECOVERY
+ *
+ * During replay WAL and replication FdwXactCtl also holds information about
+ * active prepared foreign transaction that haven't been moved to disk yet.
+ *
+ * Replay of fdwxact records happens by the following rules:
+ *
+ * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXactState
+ *	 with entries marked with fdwxact->inredo and fdwxact->ondisk.	FdwXactState file
+ *	 data older than the XID horizon of the redo position are discarded.
+ * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->xacts.
+ *	 We set fdwxact->inredo to true for such entries.
+ * * On Checkpoint we iterate through FdwXactCtl->xacts entries that
+ *	 have fdwxact->inredo set and are behind the redo_horizon.	We save
+ *	 them to disk and then set fdwxact->ondisk to true.
+ * * On resolution we delete the entry from FdwXactCtl->xacts.  If
+ *	 fdwxact->ondisk is true, the corresponding entry from the disk is
+ *	 additionally deleted.
+ * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through
+ *	 fdwxact->inredo entries that have not made it to disk.
+ *
+ * These replay rules are borrowed from twophase.c
+ *
  * Portions Copyright (c) 2021, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
@@ -21,15 +72,42 @@
  */
 #include "postgres.h"
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/twophase.h"
+#include "access/resolver_internal.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
 #include "catalog/pg_user_mapping.h"
 #include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "replication/syncrep.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/procarray.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/syscache.h"
 
+/* Directory where the foreign prepared transaction files will reside */
+#define FDWXACTS_DIR "pg_fdwxact"
+
 /* Initial size of the hash table */
 #define FDWXACT_HASH_SIZE	64
 
@@ -37,6 +115,23 @@
 #define ServerSupportTransactionCallback(fdwent) \
 	(((FdwXactEntry *)(fdwent))->commit_foreign_xact_fn != NULL)
 
+/* Check the FdwXactEntry is capable of two-phase commit  */
+#define ServerSupportTwophaseCommit(fdwent) \
+	(((FdwXactEntry *)(fdwent))->prepare_foreign_xact_fn != NULL)
+
+/*
+ * Name of foreign prepared transaction file is 8 bytes xid and
+ * user mapping OID separated by '_'.
+ *
+ * Since FdwXactState is identified by user mapping OID and it's unique
+ * within a distributed transaction, the name is fairly enough to
+ * ensure uniqueness.
+ */
+#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8)
+#define FdwXactStateFilePath(path, xid, umid)	\
+	snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X", \
+			 xid, umid)
+
 /*
  * Structure to bundle the foreign transaction participant.
  *
@@ -51,26 +146,146 @@ typedef struct FdwXactEntry
 	ForeignServer *server;
 	UserMapping *usermapping;
 
+	/*
+	 * Pointer to a FdwXactState entry in the global array. NULL if the entry
+	 * is not inserted yet but this is registered as a participant.
+	 */
+	FdwXactState fdwxact;
+
 	/* Callbacks for foreign transaction */
 	CommitForeignTransaction_function commit_foreign_xact_fn;
 	RollbackForeignTransaction_function rollback_foreign_xact_fn;
+	PrepareForeignTransaction_function prepare_foreign_xact_fn;
 } FdwXactEntry;
 
 /*
- * Foreign transaction participants involved in the current transaction.
- * A member of participants must support both commit and rollback APIs
+ * The current distributed transaction state.  Members of participants
+ * must support at least both commit and rollback APIs
  * (i.g., ServerSupportTransactionCallback() is true).
  */
-static HTAB *FdwXactParticipants = NULL;
+typedef struct DistributedXactStateData
+{
+	bool		local_prepared; /* will (did) we prepare the local transaction? */
+
+	/* Statistics of participants */
+	int			nparticipants_no_twophase;	/* how many participants doesn't
+											 * support two-phase commit
+											 * protocol? */
+
+	HTAB	   *participants;	/* foreign transaction participants (FdwXactEntry) */
+	List	   *serveroids_uniq;	/* list of unique server OIDs in
+									 * participants */
+} DistributedXactStateData;
+static DistributedXactStateData DistributedXactState = {
+	.local_prepared = false,
+	.nparticipants_no_twophase = 0,
+	.participants = NULL,
+	.serveroids_uniq = NIL,
+};
 
 /* Check the current transaction has at least one fdwxact participant */
 #define HasFdwXactParticipant() \
-	(FdwXactParticipants != NULL && \
-	 hash_get_num_entries(FdwXactParticipants) > 0)
+	(DistributedXactState.participants != NULL && \
+	 hash_get_num_entries(DistributedXactState.participants) > 0)
+
+/* Keep track of registering process exit call back. */
+static bool fdwXactExitRegistered = false;
+
+/* Guc parameter */
+int			max_prepared_foreign_xacts = 0;
+int			max_foreign_xact_resolvers = 0;
 
+static void RemoveFdwXactEntry(Oid umid);
 static void EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit,
 							bool is_parallel_worker);
-static void RemoveFdwXactEntry(Oid umid);
+static char *getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid);
+static void ForgetAllParticipants(void);
+static void FdwXactLaunchResolvers(void);
+
+static void PrepareAllFdwXacts(TransactionId xid);
+static XLogRecPtr FdwXactInsertEntry(TransactionId xid, FdwXactEntry *fdwent,
+									 char *identifier);
+static void AtProcExit_FdwXact(int code, Datum arg);
+static void FdwXactComputeRequiredXmin(void);
+static FdwXactStatus FdwXactGetTransactionFate(TransactionId xid);
+static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn);
+static void FdwXactRedoRemove(TransactionId xid, Oid umid, bool givewarning);
+static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len);
+static char *ProcessFdwXactBuffer(TransactionId xid, Oid umid,
+								  XLogRecPtr insert_start_lsn, bool fromdisk);
+static char *ReadFdwXactStateFile(TransactionId xid, Oid umid);
+static void RemoveFdwXactStateFile(TransactionId xid, Oid umid, bool giveWarning);
+static void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, int len);
+
+static FdwXactState insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid,
+								   Oid owner, char *identifier);
+static void remove_fdwxact(FdwXactState fdwxact);
+static List *find_fdwxacts(TransactionId xid, Oid umid, Oid dbid);
+static FdwXactState get_fdwxact_with_check(TransactionId xid, Oid umid,
+										   bool check_two_phase);
+static void pg_foreign_xact_callback(int code, Datum arg);
+
+/*
+ * Calculates the size of shared memory allocated for maintaining foreign
+ * prepared transaction entries.
+ */
+Size
+FdwXactShmemSize(void)
+{
+	Size		size;
+
+	/* Size for foreign transaction information array */
+	size = offsetof(FdwXactCtlData, xacts);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactState)));
+	size = MAXALIGN(size);
+	size = add_size(size, mul_size(max_prepared_foreign_xacts,
+								   sizeof(FdwXactStateData)));
+
+	return size;
+}
+
+/*
+ * Initialization of shared memory for maintaining foreign prepared transaction
+ * entries. The shared memory layout is defined in definition of FdwXactCtlData
+ * structure.
+ */
+void
+FdwXactShmemInit(void)
+{
+	bool		found;
+
+	FdwXactCtl = ShmemInitStruct("Foreign transactions table",
+								 FdwXactShmemSize(),
+								 &found);
+	if (!IsUnderPostmaster)
+	{
+		FdwXactState fdwxacts;
+		int			cnt;
+
+		Assert(!found);
+		FdwXactCtl->free_fdwxacts = NULL;
+		FdwXactCtl->num_xacts = 0;
+
+		/* Initialize the linked list of free FDW transactions */
+		fdwxacts = (FdwXactState)
+			((char *) FdwXactCtl +
+			 MAXALIGN(offsetof(FdwXactCtlData, xacts) +
+					  sizeof(FdwXactState) * max_prepared_foreign_xacts));
+		for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++)
+		{
+			fdwxacts[cnt].status = FDWXACT_STATUS_INVALID;
+			fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+			FdwXactCtl->free_fdwxacts = &fdwxacts[cnt];
+			SpinLockInit(&fdwxacts[cnt].mutex);
+		}
+	}
+	else
+	{
+		Assert(FdwXactCtl);
+		Assert(found);
+	}
+}
 
 /*
  * Register the given foreign transaction participant identified by the
@@ -87,20 +302,21 @@ FdwXactRegisterEntry(UserMapping *usermapping)
 
 	Assert(IsTransactionState());
 
-	if (FdwXactParticipants == NULL)
+	if (DistributedXactState.participants == NULL)
 	{
 		HASHCTL		ctl;
 
 		ctl.keysize = sizeof(Oid);
 		ctl.entrysize = sizeof(FdwXactEntry);
 
-		FdwXactParticipants = hash_create("fdw xact participants",
-										  FDWXACT_HASH_SIZE,
-										  &ctl, HASH_ELEM | HASH_BLOBS);
+		DistributedXactState.participants = hash_create("fdw xact participants",
+														FDWXACT_HASH_SIZE,
+														&ctl, HASH_ELEM | HASH_BLOBS);
 	}
 
 	umid = usermapping->umid;
-	fdwent = hash_search(FdwXactParticipants, (void *) &umid, HASH_ENTER, &found);
+	fdwent = hash_search(DistributedXactState.participants,
+						 (void *) &umid, HASH_ENTER, &found);
 
 	if (found)
 		return;
@@ -124,13 +340,22 @@ FdwXactRegisterEntry(UserMapping *usermapping)
 		ereport(ERROR,
 				(errmsg("cannot register foreign server not supporting transaction callback")));
 
+	fdwent->fdwxact = NULL;
 	fdwent->commit_foreign_xact_fn = routine->CommitForeignTransaction;
 	fdwent->rollback_foreign_xact_fn = routine->RollbackForeignTransaction;
+	fdwent->prepare_foreign_xact_fn = routine->PrepareForeignTransaction;
 
 	MemoryContextSwitchTo(old_ctx);
+
+	/* Update statistics */
+	if (!ServerSupportTwophaseCommit(fdwent))
+		DistributedXactState.nparticipants_no_twophase++;
+
+	Assert(DistributedXactState.nparticipants_no_twophase <=
+		   hash_get_num_entries(DistributedXactState.participants));
 }
 
-/* Remove the foreign transaction from FdwXactParticipants */
+/* Remove the foreign transaction from the current participants */
 void
 FdwXactUnregisterEntry(UserMapping *usermapping)
 {
@@ -145,7 +370,21 @@ FdwXactUnregisterEntry(UserMapping *usermapping)
 static void
 RemoveFdwXactEntry(Oid umid)
 {
-	(void) hash_search(FdwXactParticipants, (void *) &umid, HASH_REMOVE, NULL);
+	FdwXactEntry *fdwent;
+
+	Assert(DistributedXactState.participants != NULL);
+	fdwent = hash_search(DistributedXactState.participants, (void *) &umid,
+						 HASH_REMOVE, NULL);
+
+	if (fdwent)
+	{
+		/* Update statistics */
+		if (!ServerSupportTwophaseCommit(fdwent))
+			DistributedXactState.nparticipants_no_twophase--;
+
+		Assert(DistributedXactState.nparticipants_no_twophase <=
+			   hash_get_num_entries(DistributedXactState.participants));
+	}
 }
 
 /*
@@ -154,29 +393,75 @@ RemoveFdwXactEntry(Oid umid)
 void
 AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker)
 {
-	FdwXactEntry *fdwent;
-	HASH_SEQ_STATUS scan;
-
 	/* If there are no foreign servers involved, we have no business here */
 	if (!HasFdwXactParticipant())
 		return;
 
-	hash_seq_init(&scan, FdwXactParticipants);
-	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	Assert(!RecoveryInProgress());
+
+	if (!isCommit)
 	{
-		Assert(ServerSupportTransactionCallback(fdwent));
+		HASH_SEQ_STATUS scan;
+		FdwXactEntry *fdwent;
 
-		/* Commit or rollback foreign transaction */
-		EndFdwXactEntry(fdwent, isCommit, is_parallel_worker);
+		/* Rollback foreign transactions in the participant list */
+		hash_seq_init(&scan, DistributedXactState.participants);
+		while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+		{
+			FdwXactState fdwxact = fdwent->fdwxact;
+			int			status;
 
-		/*
-		 * Remove the entry so that we don't recursively process this foreign
-		 * transaction.
-		 */
-		RemoveFdwXactEntry(fdwent->umid);
+			/*
+			 * If this foreign transaction is not prepared yet, end the
+			 * foreign transaction in one-phase.
+			 */
+			if (!fdwxact)
+			{
+				Assert(ServerSupportTransactionCallback(fdwent));
+				EndFdwXactEntry(fdwent, false, is_parallel_worker);
+
+				/*
+				 * Remove FdwXactState entry to prevent processing again in a
+				 * recursive error case.
+				 */
+				RemoveFdwXactEntry(fdwent->umid);
+				continue;
+			}
+
+			/*
+			 * If the foreign transaction has FdwXactState entry, the foreign
+			 * transaction might have been prepared.  We rollback the foreign
+			 * transaction anyway to end the current transaction if the status
+			 * is in-progress.  Since the transaction might have been already
+			 * prepared on the foreign we set the status to aborting and leave
+			 * it.
+			 */
+			SpinLockAcquire(&(fdwxact->mutex));
+			status = fdwxact->status;
+			fdwxact->status = FDWXACT_STATUS_ABORTING;
+			SpinLockRelease(&(fdwxact->mutex));
+
+			if (status == FDWXACT_STATUS_PREPARING)
+				EndFdwXactEntry(fdwent, isCommit, is_parallel_worker);
+		}
 	}
 
-	Assert(!HasFdwXactParticipant());
+	/* Unlock all participants */
+	ForgetAllParticipants();
+
+	/*
+	 * Launch the resolver processes if we failed to prepare the local
+	 * transaction after preparing the foreign transactions.  In this case, we
+	 * need to rollback the prepared transaction on the foreign servers.
+	 */
+	if (DistributedXactState.local_prepared && !isCommit)
+		FdwXactLaunchResolvers();
+
+	/* Reset all fields */
+	DistributedXactState.local_prepared = false;
+	DistributedXactState.nparticipants_no_twophase = 0;
+	list_free(DistributedXactState.serveroids_uniq);
+	DistributedXactState.serveroids_uniq = NIL;
 }
 
 /*
@@ -193,6 +478,7 @@ EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit, bool is_parallel_worker)
 	finfo.usermapping = fdwent->usermapping;
 	finfo.flags = FDWXACT_FLAG_ONEPHASE |
 		((is_parallel_worker) ? FDWXACT_FLAG_PARALLEL_WORKER : 0);
+	finfo.identifier = NULL;
 
 	if (isCommit)
 	{
@@ -209,15 +495,1610 @@ EndFdwXactEntry(FdwXactEntry *fdwent, bool isCommit, bool is_parallel_worker)
 }
 
 /*
- * This function is called at PREPARE TRANSACTION.  Since we don't support
- * preparing foreign transactions for now, raise an error if the local transaction
- * has any foreign transaction.
+ * Prepare foreign transactions by PREPARE TRANSACTION command.
+ *
+ * In case where an error happens during parparing a foreign transaction we
+ * change to rollback.  See AtEOXact_FdwXact() for details.
  */
 void
 AtPrepare_FdwXact(void)
 {
-	if (HasFdwXactParticipant())
+	TransactionId xid;
+
+	/* If there are no foreign servers involved, we have no business here */
+	if (!HasFdwXactParticipant())
+		return;
+
+	/*
+	 * Check if there is a server that doesn't support two-phase commit. All
+	 * involved servers need to support two-phase commit as we're going to
+	 * prepare all of them.
+	 */
+	if (DistributedXactState.nparticipants_no_twophase > 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
+				 errmsg("cannot PREPARE a distributed transaction that has operated on a foreign server not supporting two-phase commit protocol")));
+
+	/*
+	 * Assign a transaction id if not yet because the local transaction id
+	 * is needed to determine the result of the distributed transaction.
+	 */
+	xid = GetTopTransactionId();
+
+	/*
+	 * Mark the local transaction will be prepared before actually preparing
+	 * any foreign trasactions so that in a case where an error happens during
+	 * preparing a foreign transaction or preparing the local transaction, we can
+	 * launch a resolver to rollback already-prepared foreign transactions.
+	 */
+	DistributedXactState.local_prepared = true;
+
+	PrepareAllFdwXacts(xid);
+}
+
+/*
+ * Pre-commit processing for foreign transactions. We commit those foreign
+ * transactions with one-phase.
+ */
+void
+PreCommit_FdwXact(bool is_parallel_worker)
+{
+	HASH_SEQ_STATUS scan;
+	FdwXactEntry *fdwent;
+
+	/*
+	 * If there is no foreign server involved or all foreign transactions are
+	 * already prepared (see AtPrepare_FdwXact()), we have no business here.
+	 */
+	if (!HasFdwXactParticipant() || DistributedXactState.local_prepared)
+		return;
+
+	Assert(!RecoveryInProgress());
+
+	/* Commit all foreign transactions in the participant list */
+	hash_seq_init(&scan, DistributedXactState.participants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		Assert(ServerSupportTransactionCallback(fdwent));
+
+		/*
+		 * Commit the foreign transaction and remove itself from the hash
+		 * table so that we don't try to abort already-closed transaction.
+		 */
+		EndFdwXactEntry(fdwent, true, is_parallel_worker);
+		RemoveFdwXactEntry(fdwent->umid);
+	}
+}
+
+/*
+ * Functions to count the number of FdwXactState entries associated with
+ * the given argument.
+ */
+int
+CountFdwXactsForUserMapping(Oid umid)
+{
+	List	   *res;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	res = find_fdwxacts(InvalidTransactionId, umid, InvalidOid);
+	LWLockRelease(FdwXactLock);
+
+	return list_length(res);
+}
+
+int
+CountFdwXactsForDB(Oid dbid)
+{
+	List	   *res;
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	res = find_fdwxacts(InvalidTransactionId, InvalidOid, dbid);
+	LWLockRelease(FdwXactLock);
+
+	return list_length(res);
+
+}
+
+/*
+ * We must fsync the foreign transaction state file that is valid or generated
+ * during redo and has a inserted LSN <= the checkpoint's redo horizon.
+ * The foreign transaction entries and hence the corresponding files are expected
+ * to be very short-lived. By executing this function at the end, we might have
+ * lesser files to fsync, thus reducing some I/O. This is similar to
+ * CheckPointTwoPhase().
+ *
+ * This is deliberately run as late as possible in the checkpoint sequence,
+ * because FdwXacts ordinarily have short lifespans, and so it is quite
+ * possible that FdwXactStates that were valid at checkpoint start will no longer
+ * exist if we wait a little bit. With typical checkpoint settings this
+ * will be about 3 minutes for an online checkpoint, so as a result we
+ * expect that there will be no FdwXactStates that need to be copied to disk.
+ *
+ * If a FdwXactState remains valid across multiple checkpoints, it will already
+ * be on disk so we don't bother to repeat that write.
+ */
+void
+CheckPointFdwXacts(XLogRecPtr redo_horizon)
+{
+	int			cnt;
+	int			serialized_fdwxacts = 0;
+
+	if (max_prepared_foreign_xacts == 0)
+		return;					/* nothing to do */
+
+	/*
+	 * We are expecting there to be zero FdwXactState that need to be copied
+	 * to disk, so we perform all I/O while holding FdwXactLock for
+	 * simplicity. This presents any new foreign xacts from preparing while
+	 * this occurs, which shouldn't be a problem since the presence of
+	 * long-lived prepared foreign xacts indicated the transaction manager
+	 * isn't active.
+	 *
+	 * It's also possible to move I/O out of the lock, but on every error we
+	 * should check whether somebody committed our transaction in different
+	 * backend. Let's leave this optimisation for future, if somebody will
+	 * spot that this place cause bottleneck.
+	 *
+	 * Note that it isn't possible for there to be a FdwXactState with a
+	 * insert_end_lsn set prior to the last checkpoint yet is marked invalid,
+	 * because of the efforts with delayChkpt.
+	 */
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (cnt = 0; cnt < FdwXactCtl->num_xacts; cnt++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[cnt];
+
+		if ((fdwxact->valid || fdwxact->inredo) &&
+			!fdwxact->ondisk &&
+			fdwxact->insert_end_lsn <= redo_horizon)
+		{
+			char	   *buf;
+			int			len;
+
+			XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len);
+			RecreateFdwXactFile(fdwxact->data.xid, fdwxact->data.umid, buf, len);
+			fdwxact->ondisk = true;
+			fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+			fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+			pfree(buf);
+			serialized_fdwxacts++;
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.	 FdwXactState files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created.
+	 */
+	fsync_fname(FDWXACTS_DIR, true);
+
+	if (log_checkpoints && serialized_fdwxacts > 0)
+		ereport(LOG,
+				(errmsg_plural("%u foreign transaction state file was written "
+							   "for long-running prepared transactions",
+							   "%u foreign transaction state files were written "
+							   "for long-running prepared transactions",
+							   serialized_fdwxacts,
+							   serialized_fdwxacts)));
+}
+
+/*
+ * Prepare all foreign transactions.
+ *
+ * The basic strategy is to create all FdwXactState entries with WAL-logging,
+ * wait for those WAL records to be replicated if synchronous replication is
+ * enabled, and then prepare foreign transactions by calling
+ * PrepareForeignTransaction FDW callback functions.  There are two points
+ * here: (a) writing WAL records before preparing foreign transactions, and
+ * (b) waiting for those records to replicated before preparing foreign
+ * transactions.
+ *
+ * You might think that we can itereate over foreign transaction while preparing
+ * the foreign transaction and writing WAL record one by one.  But this doesn't
+ * work if the server crashes in the middle of those operations. We will end
+ * up losing foreign prepared transaction information if the server crashes
+ * and/or the failover happens.  Therefore, we need point (a) because otherwise
+ * we will lost the prepared transaction on the foreign server and will not be
+ * able to resolve it after the server crash.  Hence  persist first then prepare.
+ * Point (b) guarantees that foreign transaction information are not lost even
+ * if the failover happens.
+ */
+static void
+PrepareAllFdwXacts(TransactionId xid)
+{
+	FdwXactEntry *fdwent;
+	XLogRecPtr	flush_lsn;
+	bool	canceled;
+	HASH_SEQ_STATUS scan;
+
+	Assert(TransactionIdIsValid(xid));
+
+	/* Persist all foreign transaction entries */
+	hash_seq_init(&scan, DistributedXactState.participants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		char	   *identifier;
+
+		Assert(ServerSupportTwophaseCommit(fdwent));
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get prepared transaction identifier */
+		identifier = getFdwXactIdentifier(fdwent, xid);
+		Assert(identifier);
+
+		/*
+		 * Insert the foreign transaction entry with the
+		 * FDWXACT_STATUS_PREPARING status. Registration persists this
+		 * information to the disk and logs (that way relaying it on standby).
+		 * Thus in case we loose connectivity to the foreign server or crash
+		 * ourselves, we will remember that we might have prepared transaction
+		 * on the foreign server and try to resolve it when connectivity is
+		 * restored or after crash recovery.
+		 */
+		flush_lsn = FdwXactInsertEntry(xid, fdwent, identifier);
+	}
+
+	HOLD_INTERRUPTS();
+
+	/* Wait for all WAL records to be replicated, if necessary */
+	canceled = SyncRepWaitForLSN(flush_lsn, false);
+
+	RESUME_INTERRUPTS();
+
+	/*
+	 * XXX: dirty hack to invoke an interruption that was absorbed.
+	 * SyncRepWaitForLSN() is aimed to be used at the point where we never be
+	 * able to change to rollback.  But at this state, preparing foreign
+	 * transactions at pre-commit phase, we still are able to change to rollback.
+	 * So if the waits canceled by an interruption, we error out. Note that
+	 * SyncRepWaitForLSN() might have been set ProcDiePending too.
+	 */
+	if (canceled)
+	{
+		QueryCancelPending = true;
+		ProcessInterrupts();
+	}
+
+	/* Prepare all foreign transactions */
+	hash_seq_init(&scan, DistributedXactState.participants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		FdwXactInfo finfo;
+		FdwXactState fdwxact = fdwent->fdwxact;
+
+		/*
+		 * Prepare the foreign transaction.  Between FdwXactInsertEntry call
+		 * till this backend hears acknowledge from foreign server, the
+		 * backend may abort the local transaction (say, because of a signal).
+		 */
+		finfo.server = fdwent->server;
+		finfo.usermapping = fdwent->usermapping;
+		finfo.flags = 0;
+		finfo.identifier = pstrdup(fdwxact->data.identifier);
+		fdwent->prepare_foreign_xact_fn(&finfo);
+
+		/* succeeded, update status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwent->fdwxact->status = FDWXACT_STATUS_PREPARED;
+		SpinLockRelease(&fdwxact->mutex);
+
+		pfree(finfo.identifier);
+	}
+}
+
+/*
+ * Return a null-terminated foreign transaction identifier.  We generate an
+ * unique identifier with in the form of
+ * "fx_<random number>_<xid>_<umid> whose length is less than FDWXACT_ID_MAX_LEN.
+ *
+ * Returned string value is used to identify foreign transaction. The
+ * identifier should not be same as any other concurrent prepared transaction
+ * identifier.
+ *
+ * To make the foreign transactionid unique, we should ideally use something
+ * like UUID, which gives unique ids with high probability, but that may be
+ * expensive here and UUID extension which provides the function to generate
+ * UUID is not part of the core code.
+ */
+static char *
+getFdwXactIdentifier(FdwXactEntry *fdwent, TransactionId xid)
+{
+	char		buf[FDWXACT_ID_MAX_LEN] = {0};
+
+	snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%u", Abs(random()),
+			 xid, fdwent->umid);
+
+	return pstrdup(buf);
+}
+
+/*
+ * This function insert a new FdwXactState entry to the global array with
+ * WAL-logging. The new entry is held by the backend who inserted.
+ */
+static XLogRecPtr
+FdwXactInsertEntry(TransactionId xid, FdwXactEntry *fdwent, char *identifier)
+{
+	FdwXactStateOnDiskData *fdwxact_file_data;
+	FdwXactState fdwxact;
+	Oid			owner;
+	int			data_len;
+
+	/* on first call, register the exit hook */
+	if (!fdwXactExitRegistered)
+	{
+		before_shmem_exit(AtProcExit_FdwXact, 0);
+		fdwXactExitRegistered = true;
+	}
+
+	/*
+	 * Enter the foreign transaction into the shared memory structure.
+	 */
+	owner = GetUserId();
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	fdwxact = insert_fdwxact(MyDatabaseId, xid, fdwent->umid,
+							 fdwent->usermapping->serverid, owner, identifier);
+	fdwxact->locking_backend = MyBackendId;
+	LWLockRelease(FdwXactLock);
+
+	/* Update FdwXactEntry */
+	fdwent->fdwxact = fdwxact;
+
+	/* Remember server's oid where we prepared the transaction */
+	DistributedXactState.serveroids_uniq =
+		list_append_unique_oid(DistributedXactState.serveroids_uniq,
+							   fdwent->usermapping->serverid);
+
+	/*
+	 * Prepare to write the entry to a file. Also add xlog entry. The contents
+	 * of the xlog record are same as what is written to the file.
+	 */
+	data_len = offsetof(FdwXactStateOnDiskData, identifier);
+	data_len = data_len + strlen(identifier) + 1;
+	data_len = MAXALIGN(data_len);
+	fdwxact_file_data = (FdwXactStateOnDiskData *) palloc0(data_len);
+	memcpy(fdwxact_file_data, &(fdwxact->data), data_len);
+
+	START_CRIT_SECTION();
+
+	/* See note in RecordTransactionCommit */
+	MyProc->delayChkpt = true;
+
+	/* Add the entry in the xlog and save LSN for checkpointer */
+	XLogBeginInsert();
+	XLogRegisterData((char *) fdwxact_file_data, data_len);
+	fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT);
+
+	/* If we crash now, we have prepared: WAL replay will fix things */
+
+	/* Store record's start location to read that later on CheckPoint */
+	fdwxact->insert_start_lsn = ProcLastRecPtr;
+
+	/* File is written completely, checkpoint can proceed with syncing */
+	fdwxact->valid = true;
+
+	/* Checkpoint can process now */
+	MyProc->delayChkpt = false;
+
+	END_CRIT_SECTION();
+
+	pfree(fdwxact_file_data);
+	return fdwxact->insert_end_lsn;
+}
+
+/*
+ * Insert a new entry for a given foreign transaction identified by transaction
+ * id, foreign server and user mapping, into the shared memory array. Caller
+ * must hold FdwXactLock in exclusive mode.
+ *
+ * If the entry already exists, the function raises an error.
+ */
+static FdwXactState
+insert_fdwxact(Oid dbid, TransactionId xid, Oid umid, Oid serverid, Oid owner,
+			   char *identifier)
+{
+	FdwXactState fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	/* Check for duplicated foreign transaction entry */
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		fdwxact = FdwXactCtl->xacts[i];
+		if (fdwxact->valid &&
+			fdwxact->data.xid == xid &&
+			fdwxact->data.umid == umid)
+			ereport(ERROR,
+					(errmsg("could not insert a foreign transaction entry"),
+					 errdetail("Duplicate entry with transaction id %u, user mapping id %u exists.",
+							   xid, umid)));
+	}
+
+	/*
+	 * Get a next free foreign transaction entry. Raise error if there are
+	 * none left.
+	 */
+	if (!FdwXactCtl->free_fdwxacts)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("maximum number of foreign transactions reached"),
+				 errhint("Increase max_prepared_foreign_transactions: \"%d\".",
+						 max_prepared_foreign_xacts)));
+	}
+	fdwxact = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next;
+
+	/* Insert the entry to shared memory array */
+	Assert(FdwXactCtl->num_xacts < max_prepared_foreign_xacts);
+	FdwXactCtl->xacts[FdwXactCtl->num_xacts++] = fdwxact;
+
+	fdwxact->status = FDWXACT_STATUS_PREPARING;
+	fdwxact->data.xid = xid;
+	fdwxact->data.dbid = dbid;
+	fdwxact->data.umid = umid;
+	fdwxact->data.serverid = serverid;
+	fdwxact->data.owner = owner;
+	strlcpy(fdwxact->data.identifier, identifier, FDWXACT_ID_MAX_LEN);
+	fdwxact->data.identifier[strlen(identifier)] = '\0';
+
+	fdwxact->insert_start_lsn = InvalidXLogRecPtr;
+	fdwxact->insert_end_lsn = InvalidXLogRecPtr;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+
+	return fdwxact;
+}
+
+/*
+ * Remove the foreign prepared transaction entry from shared memory.
+ * Caller must hold FdwXactLock in exclusive mode.
+ */
+static void
+remove_fdwxact(FdwXactState fdwxact)
+{
+	int			i;
+
+	Assert(fdwxact != NULL);
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	elog(DEBUG2, "remove fdwxact entry id %s", fdwxact->data.identifier);
+
+	if (!RecoveryInProgress())
+	{
+		xl_fdwxact_remove record;
+		XLogRecPtr	recptr;
+
+		/* Fill up the log record before releasing the entry */
+		record.xid = fdwxact->data.xid;
+		record.umid = fdwxact->data.umid;
+
+		/*
+		 * Now writing FdwXactState data to WAL. We have to set delayChkpt
+		 * here, otherwise a checkpoint starting immediately after the WAL
+		 * record is inserted could complete without fsync'ing our state file.
+		 * (This is essentially the same kind of race condition as the
+		 * COMMIT-to-clog-write case that RecordTransactionCommit uses
+		 * delayChkpt for; see notes there.)
+		 */
+		START_CRIT_SECTION();
+
+		MyProc->delayChkpt = true;
+
+		/*
+		 * Log that we are removing the foreign transaction entry and remove
+		 * the file from the disk as well.
+		 */
+		XLogBeginInsert();
+		XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove));
+		recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE);
+
+		/* Always flush, since we're about to remove the FdwXact state file */
+		XLogFlush(recptr);
+
+		/* Now we can mark ourselves as out of the commit critical section */
+		MyProc->delayChkpt = false;
+
+		END_CRIT_SECTION();
+	}
+
+	/* Search the slot where this entry resided */
+	for (i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		if (FdwXactCtl->xacts[i] == fdwxact)
+			break;
+	}
+
+	Assert(i < FdwXactCtl->num_xacts);
+
+	/* Clean up any files we may have left */
+	if (fdwxact->ondisk)
+		RemoveFdwXactStateFile(fdwxact->data.xid, fdwxact->data.umid, true);
+
+	/* Remove the entry from active array */
+	FdwXactCtl->num_xacts--;
+	FdwXactCtl->xacts[i] = FdwXactCtl->xacts[FdwXactCtl->num_xacts];
+
+	/* Put it back into free list */
+	fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts;
+	FdwXactCtl->free_fdwxacts = fdwxact;
+
+	/* Reset informations */
+	fdwxact->status = FDWXACT_STATUS_INVALID;
+	fdwxact->locking_backend = InvalidBackendId;
+	fdwxact->valid = false;
+	fdwxact->ondisk = false;
+	fdwxact->inredo = false;
+}
+
+/*
+ * When the process exits, unlock all the entries.
+ */
+static void
+AtProcExit_FdwXact(int code, Datum arg)
+{
+	ForgetAllParticipants();
+	FdwXactLaunchResolvers();
+}
+
+/*
+ * Unlock all foreign transaction participants.  If we left foreign transaction,
+ * update the oldest xmin of unresolved transaction to prevent the local
+ * transaction id of such unresolved foreign transaction from begin truncated.
+ * Returns the number of remaining foreign transactions.  We return the unique
+ * list of foreign servers' oids so that the caller request to launch the resolver
+ * using it.
+ */
+static void
+ForgetAllParticipants(void)
+{
+	FdwXactEntry *fdwent;
+	HASH_SEQ_STATUS scan;
+	int			nremaining = 0;
+
+	if (!HasFdwXactParticipant())
+		return;
+
+	hash_seq_init(&scan, DistributedXactState.participants);
+	while ((fdwent = (FdwXactEntry *) hash_seq_search(&scan)))
+	{
+		FdwXactState fdwxact = fdwent->fdwxact;
+
+		if (fdwxact)
+		{
+			Assert(fdwxact->locking_backend == MyBackendId);
+
+			/* Unlock the foreign transaction entry */
+			LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+			fdwxact->locking_backend = InvalidBackendId;
+			LWLockRelease(FdwXactLock);
+
+			nremaining++;
+		}
+
+		/* Remove from the participants list */
+		RemoveFdwXactEntry(fdwent->umid);
+	}
+
+	/*
+	 * If we leave any FdwXactState entries, update the oldest local
+	 * transaction of unresolved distributed transaction.
+	 */
+	if (nremaining > 0)
+		FdwXactComputeRequiredXmin();
+
+	Assert(!HasFdwXactParticipant());
+}
+
+static void
+FdwXactLaunchResolvers(void)
+{
+	if (list_length(DistributedXactState.serveroids_uniq) > 0)
+		LaunchOrWakeupFdwXactResolver(DistributedXactState.serveroids_uniq);
+}
+
+/*
+ * Launch or wake up the resolver to resolve the given transaction.
+ */
+void
+FdwXactLaunchResolversForXid(TransactionId xid)
+{
+	List	   *serveroids = NIL;
+
+	if (max_prepared_foreign_xacts == 0)
+		return;					/* nothing to do */
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* Collect server oids associated with the given transaction */
+		if (fdwxact->data.xid == xid)
+			serveroids = list_append_unique_oid(serveroids, fdwxact->data.serverid);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* Exit if there is no servers to launch */
+	if (serveroids == NIL)
+		return;
+
+	LaunchOrWakeupFdwXactResolver(serveroids);
+	list_free(serveroids);
+}
+
+/*
+ * Commit or rollback one prepared foreign transaction, and remove FdwXactState
+ * entry.
+ */
+void
+ResolveOneFdwXact(FdwXactState fdwxact)
+{
+	FdwXactInfo finfo;
+	FdwRoutine *routine;
+
+	/* The FdwXactState entry must be held by me */
+	Assert(fdwxact != NULL);
+	Assert(fdwxact->locking_backend == MyBackendId);
+	Assert(fdwxact->status == FDWXACT_STATUS_PREPARED ||
+		   fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+		   fdwxact->status == FDWXACT_STATUS_ABORTING);
+
+	/* Set whether we do commit or abort if not set yet */
+	if (fdwxact->status == FDWXACT_STATUS_PREPARED)
+	{
+		FdwXactStatus new_status;
+
+		new_status = FdwXactGetTransactionFate(fdwxact->data.xid);
+		Assert(new_status == FDWXACT_STATUS_COMMITTING ||
+			   new_status == FDWXACT_STATUS_ABORTING);
+
+		/* Update the status */
+		SpinLockAcquire(&fdwxact->mutex);
+		fdwxact->status = new_status;
+		SpinLockRelease(&fdwxact->mutex);
+	}
+
+	routine = GetFdwRoutineByServerId(fdwxact->data.serverid);
+
+	/* Prepare the foreign transaction information to pass to API */
+	finfo.server = GetForeignServer(fdwxact->data.serverid);
+	finfo.usermapping = GetUserMapping(fdwxact->data.owner, fdwxact->data.serverid);
+	finfo.flags = 0;
+	finfo.identifier = fdwxact->data.identifier;
+
+	if (fdwxact->status == FDWXACT_STATUS_COMMITTING)
+	{
+		routine->CommitForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully committed the prepared foreign transaction %s",
+			 fdwxact->data.identifier);
+	}
+	else
+	{
+		routine->RollbackForeignTransaction(&finfo);
+		elog(DEBUG1, "successfully rolled back the prepared foreign transaction %s",
+			 fdwxact->data.identifier);
+	}
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	remove_fdwxact(fdwxact);
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Compute the oldest xmin across all unresolved foreign transactions
+ * and store it in the ProcArray.
+ */
+static void
+FdwXactComputeRequiredXmin(void)
+{
+	TransactionId agg_xmin = InvalidTransactionId;
+
+	Assert(FdwXactCtl != NULL);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		Assert(TransactionIdIsValid(fdwxact->data.xid));
+
+		/*
+		 * We can exclude entries that are marked as either committing or
+		 * aborting and its state file is on disk since such entries no longer
+		 * need to lookup its transaction status from the commit log.
+		 */
+		if (!TransactionIdIsValid(agg_xmin) ||
+			TransactionIdPrecedes(fdwxact->data.xid, agg_xmin) ||
+			(fdwxact->ondisk &&
+			 (fdwxact->status == FDWXACT_STATUS_COMMITTING ||
+			  fdwxact->status == FDWXACT_STATUS_ABORTING)))
+			agg_xmin = fdwxact->data.xid;
+	}
+
+	LWLockRelease(FdwXactLock);
+
+	ProcArraySetFdwXactUnresolvedXmin(agg_xmin);
+}
+
+
+/*
+ * Return whether the foreign transaction associated with the given transaction
+ * id should be committed or rolled back according to the result of the local
+ * transaction.
+ */
+static FdwXactStatus
+FdwXactGetTransactionFate(TransactionId xid)
+{
+	/*
+	 * If the local transaction is already committed, commit prepared foreign
+	 * transaction.
+	 */
+	if (TransactionIdDidCommit(xid))
+		return FDWXACT_STATUS_COMMITTING;
+
+	/*
+	 * If the local transaction is already aborted, abort prepared foreign
+	 * transactions.
+	 */
+	else if (TransactionIdDidAbort(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The local transaction is not in progress but the foreign transaction is
+	 * not prepared on the foreign server. This can happen when transaction
+	 * failed after registered this entry but before actual preparing on the
+	 * foreign server. So let's assume it aborted.
+	 */
+	else if (!TransactionIdIsInProgress(xid))
+		return FDWXACT_STATUS_ABORTING;
+
+	/*
+	 * The Local transaction is in progress and foreign transaction is about
+	 * to be committed or aborted.	Raise an error anyway since we cannot
+	 * determine the fate of this foreign transaction according to the local
+	 * transaction whose fate is also not determined.
+	 */
+	elog(ERROR,
+		 "cannot resolve the foreign transaction associated with in-process transaction");
+
+	pg_unreachable();
+}
+
+
+/*
+ * Recreates a foreign transaction state file. This is used in WAL replay
+ * and during checkpoint creation.
+ *
+ * Note: content and len don't include CRC.
+ */
+void
+RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, int len)
+{
+	char		path[MAXPGPATH];
+	pg_crc32c	statefile_crc;
+	int			fd;
+
+	/* Recompute CRC */
+	INIT_CRC32C(statefile_crc);
+	COMP_CRC32C(statefile_crc, content, len);
+	FIN_CRC32C(statefile_crc);
+
+	FdwXactStateFilePath(path, xid, umid);
+
+	fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not recreate foreign transaction state file \"%s\": %m",
+						path)));
+
+	/* Write content and CRC */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE);
+	if (write(fd, content, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
+	{
+		if (errno == 0)
+			errno = ENOSPC;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write foreign transaction state file: %m")));
+	}
+	pgstat_report_wait_end();
+
+	/*
+	 * We must fsync the file because the end-of-replay checkpoint will not do
+	 * so, there being no FDWXACT in shared memory yet to tell it to.
+	 */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC);
+	if (pg_fsync(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync foreign transaction state file: %m")));
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd) != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close foreign transaction file: %m")));
+}
+
+
+/* Apply the redo log for a foreign transaction */
+void
+fdwxact_redo(XLogReaderState *record)
+{
+	char	   *rec = XLogRecGetData(record);
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	if (info == XLOG_FDWXACT_INSERT)
+	{
+		/*
+		 * Add fdwxact entry and set start/end lsn of the WAL record in
+		 * FdwXactState entry.
+		 */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoAdd(XLogRecGetData(record), record->ReadRecPtr,
+					   record->EndRecPtr);
+		LWLockRelease(FdwXactLock);
+	}
+	else if (info == XLOG_FDWXACT_REMOVE)
+	{
+		xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec;
+
+		/* Delete FdwXactState entry and file if exists */
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		FdwXactRedoRemove(record->xid, record->umid, false);
+		LWLockRelease(FdwXactLock);
+	}
+	else
+		elog(ERROR, "invalid log type %d in foreign transaction log record", info);
+
+	return;
+}
+
+/*
+ * Scan the shared memory entries of FdwXactState and determine the range of valid
+ * XIDs present.  This is run during database startup, after we have completed
+ * reading WAL.	 ShmemVariableCache->nextXid has been set to one more than
+ * the highest XID for which evidence exists in WAL.
+
+ * On corrupted two-phase files, fail immediately.	Keeping around broken
+ * entries and let replay continue causes harm on the system, and a new
+ * backup should be rolled in.
+
+ * Our other responsibility is to update and return the oldest valid XID
+ * among the distributed transactions. This is needed to synchronize pg_subtrans
+ * startup properly.
+ */
+TransactionId
+PrescanFdwXacts(TransactionId oldestActiveXid)
+{
+	FullTransactionId nextXid = ShmemVariableCache->nextXid;
+	TransactionId origNextXid = XidFromFullTransactionId(nextXid);
+	TransactionId result = origNextXid;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->data.xid, fdwxact->data.umid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		if (TransactionIdPrecedes(fdwxact->data.xid, result))
+			result = fdwxact->data.xid;
+
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+
+	return result;
+}
+
+/*
+ * Scan pg_fdwxact and fill FdwXactState depending on the on-disk data.
+ * This is called once at the beginning of recovery, saving any extra
+ * lookups in the future.  FdwXactState files that are newer than the
+ * minimum XID horizon are discarded on the way.
+ */
+void
+RestoreFdwXactData(void)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	cldir = AllocateDir(FDWXACTS_DIR);
+	while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN &&
+			strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN)
+		{
+			TransactionId xid;
+			Oid			umid;
+			char	   *buf;
+
+			sscanf(clde->d_name, "%08x_%08x", &xid, &umid);
+
+			/* Read fdwxact data from disk */
+			buf = ProcessFdwXactBuffer(xid, umid, InvalidXLogRecPtr,
+									   true);
+			if (buf == NULL)
+				continue;
+
+			/* Add this entry into the table of foreign transactions */
+			FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr);
+		}
+	}
+
+	LWLockRelease(FdwXactLock);
+	FreeDir(cldir);
+}
+
+/*
+ * Scan the shared memory entries of FdwXactState and valid them.
+ *
+ * This is run at the end of recovery, but before we allow backends to write
+ * WAL.
+ */
+void
+RecoverFdwXacts(void)
+{
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+		char	   *buf;
+
+		buf = ProcessFdwXactBuffer(fdwxact->data.xid, fdwxact->data.umid,
+								   fdwxact->insert_start_lsn, fdwxact->ondisk);
+
+		if (buf == NULL)
+			continue;
+
+		ereport(LOG,
+				(errmsg("recovering foreign prepared transaction %s from shared memory",
+						fdwxact->data.identifier)));
+
+		/* recovered, so reset the flag for entries generated by redo */
+		fdwxact->inredo = false;
+		fdwxact->valid = true;
+		pfree(buf);
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Store pointer to the start/end of the WAL record along with the xid in
+ * a fdwxact entry in shared memory FdwXactData structure.
+ */
+static void
+FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
+{
+	FdwXactStateOnDiskData *fdwxact_data = (FdwXactStateOnDiskData *) buf;
+	FdwXactState fdwxact;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	/*
+	 * Add this entry into the table of foreign transactions. The status of
+	 * the transaction is set as preparing, since we do not know the exact
+	 * status right now. Resolver will set it later based on the status of
+	 * local transaction which prepared this foreign transaction.
+	 */
+	fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->xid,
+							 fdwxact_data->umid, fdwxact_data->serverid,
+							 fdwxact_data->owner, fdwxact_data->identifier);
+
+	elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u user mapping %u owner %u id %s",
+		 fdwxact_data->dbid, fdwxact_data->xid,
+		 fdwxact_data->umid, fdwxact_data->owner,
+		 fdwxact_data->identifier);
+
+	/*
+	 * Set status as PREPARED, since we do not know the xact status right now.
+	 * We will set it later based on the status of local transaction that
+	 * prepared this fdwxact entry.
+	 */
+	fdwxact->status = FDWXACT_STATUS_PREPARED;
+	fdwxact->insert_start_lsn = start_lsn;
+	fdwxact->insert_end_lsn = end_lsn;
+	fdwxact->inredo = true;		/* added in redo */
+	fdwxact->valid = false;
+	fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn);
+}
+
+/*
+ * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove
+ * FdwXactState file if a foreign transaction was saved via an earlier checkpoint.
+ * We could not found the FdwXactState entry in the case where a crash recovery
+ * starts from the point where is after added but before removed the entry.
+ */
+static void
+FdwXactRedoRemove(TransactionId xid, Oid umid, bool givewarning)
+{
+	FdwXactState fdwxact;
+	int			i;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+	Assert(RecoveryInProgress());
+
+	for (i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		fdwxact = FdwXactCtl->xacts[i];
+
+		if (fdwxact->data.xid == xid && fdwxact->data.umid == umid)
+			break;
+	}
+
+	if (i >= FdwXactCtl->num_xacts)
+		return;
+
+	/* Clean up entry and any files we may have left */
+	remove_fdwxact(fdwxact);
+
+	elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction %s",
+		 fdwxact->data.identifier);
+}
+
+/*
+ * Reads foreign transaction data from xlog. During checkpoint this data will
+ * be moved to fdwxact files and ReadFdwXactStateFile should be used instead.
+ *
+ * Note clearly that this function accesses WAL during normal operation, similarly
+ * to the way WALSender or Logical Decoding would do. It does not run during
+ * crash recovery or standby processing.
+ */
+static void
+XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len)
+{
+	XLogRecord *record;
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	TimeLineID	save_currtli = ThisTimeLineID;
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &read_local_xlog_page,
+											   .segment_open = &wal_segment_open,
+											   .segment_close = &wal_segment_close),
+									NULL);
+
+	if (!xlogreader)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating an XLog reading processor.")));
+
+	XLogBeginRead(xlogreader, lsn);
+	record = XLogReadRecord(xlogreader, &errormsg);
+
+	/*
+	 * Restore immediately the timeline where it was previously, as
+	 * read_local_xlog_page() could have changed it if the record was read
+	 * while recovery was finishing or if the timeline has jumped in-between.
+	 */
+	ThisTimeLineID = save_currtli;
+
+	if (record == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read foreign transaction state from xlog at %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID ||
+		(XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("expected foreign transaction state data is not present in xlog at %X/%X",
+						LSN_FORMAT_ARGS(lsn))));
+
+	if (len != NULL)
+		*len = XLogRecGetDataLen(xlogreader);
+
+	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
+	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
+
+	XLogReaderFree(xlogreader);
+}
+
+/*
+ * Given a transaction id, userid and serverid read it either from disk
+ * or read it directly via shmem xlog record pointer using the provided
+ * "insert_start_lsn".
+ */
+static char *
+ProcessFdwXactBuffer(TransactionId xid, Oid umid, XLogRecPtr insert_start_lsn,
+					 bool fromdisk)
+{
+	TransactionId origNextXid =
+	XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	char	   *buf;
+
+	Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE));
+
+	if (!fromdisk)
+		Assert(!XLogRecPtrIsInvalid(insert_start_lsn));
+
+	/* Reject XID if too new */
+	if (TransactionIdFollowsOrEquals(xid, origNextXid))
+	{
+		if (fromdisk)
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state file for xid %u and user mapping %u",
+							xid, umid)));
+			RemoveFdwXactStateFile(xid, umid, true);
+		}
+		else
+		{
+			ereport(WARNING,
+					(errmsg("removing future fdwxact state from memory for xid %u and user mapping %u",
+							xid, umid)));
+			FdwXactRedoRemove(xid, umid, true);
+		}
+		return NULL;
+	}
+
+	if (fromdisk)
+	{
+		/* Read and validate file */
+		buf = ReadFdwXactStateFile(xid, umid);
+	}
+	else
+	{
+		/* Read xlog data */
+		XlogReadFdwXactData(insert_start_lsn, &buf, NULL);
+	}
+
+	return buf;
+}
+
+/*
+ * Read and validate the foreign transaction state file.
+ *
+ * If it looks OK (has a valid magic number and CRC), return the palloc'd
+ * contents of the file, issuing an error when finding corrupted data.
+ * This state can be reached when doing recovery.
+ */
+static char *
+ReadFdwXactStateFile(TransactionId xid, Oid umid)
+{
+	char		path[MAXPGPATH];
+	int			fd;
+	FdwXactStateOnDiskData *fdwxact_file_data;
+	struct stat stat;
+	uint32		crc_offset;
+	pg_crc32c	calc_crc;
+	pg_crc32c	file_crc;
+	char	   *buf;
+	int			r;
+
+	FdwXactStateFilePath(path, xid, umid);
+
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open FDW transaction state file \"%s\": %m",
+						path)));
+
+	/*
+	 * Check file length.  We can determine a lower bound pretty easily. We
+	 * set an upper bound to avoid palloc() failure on a corrupt file, though
+	 * we can't guarantee that we won't get an out of memory error anyway,
+	 * even on a valid file.
+	 */
+	if (fstat(fd, &stat))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not stat FDW transaction state file \"%s\": %m",
+						path)));
+
+	if (stat.st_size < (offsetof(FdwXactStateOnDiskData, identifier) +
+						sizeof(pg_crc32c)) ||
+		stat.st_size > MaxAllocSize)
+
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("too large FDW transaction state file \"%s\": %m",
+						path)));
+
+	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	if (crc_offset != MAXALIGN(crc_offset))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+						path)));
+
+	/*
+	 * Ok, slurp in the file.
+	 */
+	buf = (char *) palloc(stat.st_size);
+	fdwxact_file_data = (FdwXactStateOnDiskData *) buf;
+
+	/* Slurp the file */
+	pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ);
+	r = read(fd, buf, stat.st_size);
+	if (r != stat.st_size)
+	{
+		if (r < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", path)));
+		else
+			ereport(ERROR,
+					(errmsg("could not read file \"%s\": read %d of %zu",
+							path, r, (Size) stat.st_size)));
+	}
+	pgstat_report_wait_end();
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m", path)));
+
+	/*
+	 * Check the CRC.
+	 */
+	INIT_CRC32C(calc_crc);
+	COMP_CRC32C(calc_crc, buf, crc_offset);
+	FIN_CRC32C(calc_crc);
+
+	file_crc = *((pg_crc32c *) (buf + crc_offset));
+
+	if (!EQ_CRC32C(calc_crc, file_crc))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("calculated CRC checksum does not match value stored in file \"%s\"",
+						path)));
+
+	/* Check if the contents is an expected data */
+	fdwxact_file_data = (FdwXactStateOnDiskData *) buf;
+	if (fdwxact_file_data->xid != xid ||
+		fdwxact_file_data->umid != umid)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("invalid foreign transaction state file \"%s\"",
+						path)));
+
+	return buf;
+}
+
+/*
+ * Remove the foreign transaction file for given entry.
+ *
+ * If giveWarning is false, do not complain about file-not-present;
+ * this is an expected case during WAL replay.
+ */
+static void
+RemoveFdwXactStateFile(TransactionId xid, Oid umid, bool giveWarning)
+{
+	char		path[MAXPGPATH];
+
+	FdwXactStateFilePath(path, xid, umid);
+	if (unlink(path) < 0 && (errno != ENOENT || giveWarning))
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not remove foreign transaction state file \"%s\": %m",
+						path)));
+}
+
+/*
+ * Return the list of FdwXactState entries that match at least one given criteria
+ * that is not invalid.  The caller must hold FdwXactLock.
+ */
+static List *
+find_fdwxacts(TransactionId xid, Oid umid, Oid dbid)
+{
+	List	   *res = NIL;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+		bool		match = false;
+
+		if (!fdwxact->valid)
+			continue;
+
+		/* xid */
+		if (TransactionIdIsValid(xid) && xid == fdwxact->data.xid)
+			match = true;
+
+		/* umid */
+		if (OidIsValid(umid) && umid == fdwxact->data.umid)
+			match = true;
+
+		/* dbid */
+		if (OidIsValid(dbid) && dbid == fdwxact->data.dbid)
+			match = true;
+
+		if (match)
+			res = lappend(res, fdwxact);
+	}
+
+	return res;
+}
+
+/*
+ * Get FdwXact entry and do some sanity checks. If check_twophase_xact is true, we
+ * also check if the given xid is prepared.  The caller must hold FdwXactLock.
+ */
+static FdwXactState
+get_fdwxact_with_check(TransactionId xid, Oid umid, bool check_twophase_xact)
+{
+	FdwXactState fdwxact = NULL;
+	Oid			myuserid;
+
+	Assert(LWLockHeldByMe(FdwXactLock));
+
+	/* Look for FdwXactState entry that matches the given xid and umid */
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fx = FdwXactCtl->xacts[i];
+
+		if (fx->valid && fx->data.xid == xid && fx->data.umid == umid)
+		{
+			fdwxact = fx;
+			break;
+		}
+	}
+
+	/* not found */
+	if (!fdwxact)
+		return NULL;
+
+	/* check if belonging to another database */
+	if (fdwxact->data.dbid != MyDatabaseId)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction belongs to another database"),
+				 errhint("Connect to the database where the transaction was created to finish it.")));
+
+	/* permission check */
+	myuserid = GetUserId();
+	if (myuserid != fdwxact->data.owner && !superuser_arg(myuserid))
+		ereport(ERROR,
+				(errmsg("permission denied to resolve prepared foreign transaction"),
+				 errhint("Must be superuser or the user that prepared the transaction")));
+
+	/* check if the entry is being processed by someone */
+	if (fdwxact->locking_backend != InvalidBackendId)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("foreign transaction with transaction identifier \"%s\" is busy",
+						fdwxact->data.identifier)));
+
+	if (check_twophase_xact && TwoPhaseExists(fdwxact->data.xid))
+	{
+		/*
+		 * the entry's local transaction is prepared. Since we cannot know the
+		 * fate of the local transaction, we cannot resolve this foreign
+		 * transaction.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot resolve foreign transaction with identifier \"%s\" whose local transaction is in-progress",
+						fdwxact->data.identifier),
+				 errhint("Do COMMIT PREPARED or ROLLBACK PREPARED")));
+	}
+
+	return fdwxact;
+}
+
+/* Error cleanup callback for pg_foreign_resolve/remove_xact */
+static void
+pg_foreign_xact_callback(int code, Datum arg)
+{
+	FdwXactState fdwxact = (FdwXactState) DatumGetPointer(arg);
+
+	if (fdwxact->valid)
+	{
+		Assert(fdwxact->locking_backend == MyBackendId);
+
+		LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+		fdwxact->locking_backend = InvalidBackendId;
+		LWLockRelease(FdwXactLock);
+	}
+}
+
+/* Built in functions */
+
+/*
+ * Structure to hold and iterate over the foreign transactions to be displayed
+ * by the built-in functions.
+ */
+typedef struct
+{
+	FdwXactState fdwxacts;
+	int			num_xacts;
+	int			cur_xact;
+}			WorkingStatus;
+
+Datum
+pg_foreign_xacts(PG_FUNCTION_ARGS)
+{
+#define PG_PREPARED_FDWXACTS_COLS	7
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+		FdwXactStatus status;
+		char	   *xact_status;
+		Datum		values[PG_PREPARED_FDWXACTS_COLS];
+		bool		nulls[PG_PREPARED_FDWXACTS_COLS];
+
+		if (!fdwxact->valid)
+			continue;
+
+		memset(nulls, 0, sizeof(nulls));
+
+		SpinLockAcquire(&fdwxact->mutex);
+		status = fdwxact->status;
+		SpinLockRelease(&fdwxact->mutex);
+
+		values[0] = TransactionIdGetDatum(fdwxact->data.xid);
+		values[1] = ObjectIdGetDatum(fdwxact->data.umid);
+		values[2] = ObjectIdGetDatum(fdwxact->data.owner);
+		values[3] = ObjectIdGetDatum(fdwxact->data.dbid);
+
+		switch (status)
+		{
+			case FDWXACT_STATUS_PREPARING:
+				xact_status = "preparing";
+				break;
+			case FDWXACT_STATUS_PREPARED:
+				xact_status = "prepared";
+				break;
+			case FDWXACT_STATUS_COMMITTING:
+				xact_status = "committing";
+				break;
+			case FDWXACT_STATUS_ABORTING:
+				xact_status = "aborting";
+				break;
+			default:
+				xact_status = "unknown";
+				break;
+		}
+
+		values[4] = CStringGetTextDatum(xact_status);
+		values[5] = CStringGetTextDatum(fdwxact->data.identifier);
+
+		if (fdwxact->locking_backend != InvalidBackendId)
+		{
+			PGPROC	   *locker = BackendIdGetProc(fdwxact->locking_backend);
+
+			values[6] = Int32GetDatum(locker->pid);
+		}
+		else
+			nulls[6] = true;
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Built-in SQL function to resolve a prepared foreign transaction.
+ */
+Datum
+pg_resolve_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			umid = PG_GETARG_OID(1);
+	FdwXactState fdwxact;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_fdwxact_with_check(xid, umid, true);
+
+	/* lock it */
+	fdwxact->locking_backend = MyBackendId;
+
+	LWLockRelease(FdwXactLock);
+
+	/*
+	 * Resolve the foreign transaction.  We ensure unlocking FdwXact entry at
+	 * an error or an interruption.
+	 *
+	 * XXX we assume that an interruption doesn't happen between locking
+	 * FdwXact entry and registering the callback, especially in
+	 * LWLockRelease().
+	 */
+	PG_ENSURE_ERROR_CLEANUP(pg_foreign_xact_callback,
+							(Datum) PointerGetDatum(fdwxact));
+	{
+		ResolveOneFdwXact(fdwxact);
+	}
+	PG_END_ENSURE_ERROR_CLEANUP(pg_foreign_xact_callback,
+								(Datum) PointerGetDatum(fdwxact));
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Built-in function to remove a prepared foreign transaction entry without
+ * resolution. The function gives a way to forget about such prepared
+ * transaction in case: the foreign server where it is prepared is no longer
+ * available, the user which prepared this transaction needs to be dropped.
+ */
+Datum
+pg_remove_foreign_xact(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0));
+	Oid			umid = PG_GETARG_OID(1);
+	FdwXactState fdwxact;
+
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+
+	fdwxact = get_fdwxact_with_check(xid, umid, false);
+
+	/* Clean up entry and any files we may have left */
+	remove_fdwxact(fdwxact);
+
+	LWLockRelease(FdwXactLock);
+
+	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/access/transam/fdwxact_launcher.c b/src/backend/access/transam/fdwxact_launcher.c
new file mode 100644
index 0000000000..59f061773c
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_launcher.c
@@ -0,0 +1,587 @@
+/*-------------------------------------------------------------------------
+ *
+ * launcher.c
+ *
+ * The foreign transaction resolver launcher process starts foreign
+ * transaction resolver processes. The launcher schedules resolver
+ * process to be started when requested by backend process.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_launcher.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "pgstat.h"
+#include "funcapi.h"
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
+#include "access/resolver_internal.h"
+#include "access/twophase.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+FdwXactResolver *MyFdwXactResolver = NULL;
+
+static volatile sig_atomic_t got_SIGUSR2 = false;
+
+static void FdwXactLauncherOnExit(int code, Datum arg);
+static void LaunchFdwXactResolver(Oid dbid, Oid serverid);
+static bool RelaunchFdwXactResolvers(void);
+static FdwXactResolver *FdwXactResolverFind(Oid dbid, Oid serverid);
+
+/* Signal handler */
+static void FdwXactLaunchHandler(SIGNAL_ARGS);
+
+
+/*
+ * Wake up the launcher process to request launching new resolvers
+ * immediately.
+ */
+void
+RequestToLaunchFdwXactResolver(void)
+{
+	if (FdwXactResolverCtl->launcher_pid != InvalidPid)
+		kill(FdwXactResolverCtl->launcher_pid, SIGUSR2);
+}
+
+/* Report shared memory space needed by FdwXactLauncherShmemInit */
+Size
+FdwXactLauncherShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, SizeOfFdwXactResolverCtlData);
+	size = add_size(size, mul_size(max_foreign_xact_resolvers,
+								   sizeof(FdwXactResolver)));
+
+	return size;
+}
+
+/*
+ * Allocate and initialize foreign transaction resolver shared
+ * memory.
+ */
+void
+FdwXactLauncherShmemInit(void)
+{
+	bool		found;
+
+	FdwXactResolverCtl = ShmemInitStruct("Foreign Transaction Launcher Data",
+										 FdwXactLauncherShmemSize(),
+										 &found);
+
+	if (!IsUnderPostmaster)
+	{
+		int			slot;
+
+		/* First time through, so initialize */
+		MemSet(FdwXactResolverCtl, 0, FdwXactLauncherShmemSize());
+		FdwXactResolverCtl->launcher_pid = InvalidPid;
+
+		for (slot = 0; slot < max_foreign_xact_resolvers; slot++)
+		{
+			FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[slot];
+
+			memset(resolver, 0, sizeof(FdwXactResolver));
+			SpinLockInit(&(resolver->mutex));
+		}
+	}
+}
+
+/*
+ * Cleanup function for fdwxact launcher
+ *
+ * Called on fdwxact launcher exit.
+ */
+static void
+FdwXactLauncherOnExit(int code, Datum arg)
+{
+	FdwXactResolverCtl->launcher_pid = InvalidPid;
+}
+
+/* SIGUSR2: set flag to launch new resolver process immediately */
+static void
+FdwXactLaunchHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Main loop for the fdwxact launcher process.
+ */
+void
+FdwXactLauncherMain(Datum main_arg)
+{
+	TimestampTz last_start_time = 0;
+
+	ereport(DEBUG1,
+			(errmsg("fdwxact resolver launcher started")));
+
+	before_shmem_exit(FdwXactLauncherOnExit, (Datum) 0);
+
+	Assert(FdwXactResolverCtl->launcher_pid == InvalidPid);
+	FdwXactResolverCtl->launcher_pid = MyProcPid;
+	FdwXactResolverCtl->launcher_latch = &MyProc->procLatch;
+
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGUSR2, FdwXactLaunchHandler);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		int			rc;
+
+		CHECK_FOR_INTERRUPTS();
+		ResetLatch(MyLatch);
+
+		now = GetCurrentTimestamp();
+
+		/*
+		 * Limit the start retry to once a
+		 * foreign_xact_resolution_retry_interval but always attempt to start
+		 * when requested.
+		 */
+		if (got_SIGUSR2 ||
+			TimestampDifferenceExceeds(last_start_time, now,
+									   foreign_xact_resolution_retry_interval))
+		{
+			MemoryContext oldctx;
+			MemoryContext subctx;
+			bool		launched;
+
+			if (got_SIGUSR2)
+				got_SIGUSR2 = false;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+										   "Foreign Transaction Launcher",
+										   ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			launched = RelaunchFdwXactResolvers();
+			if (launched)
+			{
+				last_start_time = now;
+				wait_time = foreign_xact_resolution_retry_interval;
+			}
+
+			/* Switch back to original memory context. */
+			MemoryContextSwitchTo(oldctx);
+
+			/* Clean the temporary memory. */
+			MemoryContextDelete(subctx);
+		}
+		else
+		{
+			/*
+			 * The wait in previous cycle was interrupted in less than
+			 * foreign_xact_resolution_retry_interval since last resolver
+			 * started, this usually means crash of the resolver, so we should
+			 * retry in foreign_xact_resolution_retry_interval again.
+			 */
+			wait_time = foreign_xact_resolution_retry_interval;
+		}
+
+		/* Wait for more work */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   wait_time,
+					   WAIT_EVENT_FDWXACT_LAUNCHER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* Not reachable */
+}
+
+/*
+ * Request the launcher to launch or wakeup foreign transaction resolvers for
+ * the given servers.
+ */
+void
+LaunchOrWakeupFdwXactResolver(List *serveroids_orig)
+{
+	List	   *wakeup_resolvers = NIL;
+	List	   *serveroids = list_copy(serveroids_orig);
+
+	/* Collect running resolvers to wakeup */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0;
+		 i < max_foreign_xact_resolvers && list_length(serveroids) > 0;
+		 i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (resolver->in_use &&
+			resolver->dbid == MyDatabaseId &&
+			list_member_oid(serveroids, resolver->serverid))
+		{
+			wakeup_resolvers = lappend(wakeup_resolvers, resolver);
+			serveroids = list_delete_oid(serveroids, resolver->serverid);
+		}
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	if (wakeup_resolvers != NIL)
+	{
+		ListCell   *lc;
+
+		foreach(lc, wakeup_resolvers)
+		{
+			FdwXactResolver *resolver = (FdwXactResolver *) lfirst(lc);
+
+			/*
+			 * Wakeup the resolver. It's possible that the resolver is
+			 * starting up and doesn't attach its slot yet. Since the resolver
+			 * will find FdwXactState entry we inserted soon we don't
+			 * anything.
+			 */
+			if (resolver->latch)
+				SetLatch(resolver->latch);
+		}
+
+		list_free(wakeup_resolvers);
+	}
+
+	/* Request to launch new resolvers if any */
+	if (list_length(serveroids) > 0)
+		RequestToLaunchFdwXactResolver();
+}
+
+/*
+ * Launch a foreign transaction resolver process that will connect to given
+ * 'dbid' and 'serverid'.
+ */
+static void
+LaunchFdwXactResolver(Oid dbid, Oid serverid)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	FdwXactResolver *resolver;
+	int			unused_slot = -1;
+	int			i;
+
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	/* Find unused resolver slot */
+	for (i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (!resolver->in_use)
+		{
+			unused_slot = i;
+			break;
+		}
+	}
+
+	/* No unused found */
+	if (i >= max_foreign_xact_resolvers)
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of foreign transaction resolver slots"),
+				 errhint("You might need to increase max_foreign_transaction_resolvers.")));
+
+	resolver = &FdwXactResolverCtl->resolvers[unused_slot];
+	resolver->in_use = true;
+	resolver->dbid = dbid;
+	resolver->serverid = serverid;
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Register the new dynamic worker */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction resolver for server %u on database %u",
+			 resolver->serverid, resolver->dbid);
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = Int32GetDatum(unused_slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		/* Failed to launch, cleanup the worker slot */
+		SpinLockAcquire(&(MyFdwXactResolver->mutex));
+		resolver->in_use = false;
+		SpinLockRelease(&(MyFdwXactResolver->mutex));
+
+		ereport(WARNING,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of background worker slots"),
+				 errhint("You might need to increase max_worker_processes.")));
+	}
+
+	/*
+	 * We don't need to wait until it attaches here because we're going to
+	 * wait until all foreign transactions are resolved.
+	 */
+}
+
+/*
+ * Check the pending foreign transaction entries and (re)launch the resolvers
+ * for them if necessary.  Return true if launched at least one resolver.
+ */
+static bool
+RelaunchFdwXactResolvers(void)
+{
+	HTAB	   *launch_resolvers;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS status;
+	struct fdwxact_resolver_entry
+	{
+		Oid			dbid;
+		Oid			serverid;
+	};
+	struct fdwxact_resolver_entry *entry;
+
+	/* Create a hash map for resolvers that need to launch */
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(struct fdwxact_resolver_entry);
+	ctl.entrysize = sizeof(struct fdwxact_resolver_entry);
+	launch_resolvers = hash_create("fdwxact resolvers to launch",
+								   32, &ctl, HASH_ELEM | HASH_BLOBS);
+
+	LWLockAcquire(FdwXactLock, LW_SHARED);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+
+		if (!fdwxact->valid)
+			continue;
+
+		/*
+		 * We need to launch resolver process if the foreign transaction is
+		 * not held by anyone and is not a part of the local prepared
+		 * transaction.
+		 */
+		if (fdwxact->locking_backend == InvalidBackendId &&
+			!TwoPhaseExists(fdwxact->data.xid))
+		{
+			struct fdwxact_resolver_entry ent = {
+				.dbid = fdwxact->data.dbid,
+				.serverid = fdwxact->data.serverid
+			};
+
+			hash_search(launch_resolvers, (void *) &ent, HASH_ENTER, NULL);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+
+	/* There is no foreign transaction to resolve, no need to launch new one */
+	if (hash_get_num_entries(launch_resolvers) == 0)
+	{
+		hash_destroy(launch_resolvers);
+		return false;
+	}
+
+	/* Check if the resolver is already running */
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+		struct fdwxact_resolver_entry ent;
+
+		if (!resolver->in_use)
+			continue;
+
+		ent.dbid = resolver->dbid;
+		ent.serverid = resolver->serverid;
+		hash_search(launch_resolvers, &ent, HASH_REMOVE, NULL);
+	}
+	LWLockRelease(FdwXactResolverLock);
+
+	/* Return if resolvers for all foreign transactions are already running */
+	if (hash_get_num_entries(launch_resolvers) == 0)
+	{
+		hash_destroy(launch_resolvers);
+		return false;
+	}
+
+	/* Launch new resolvers */
+	hash_seq_init(&status, launch_resolvers);
+	while ((entry = (struct fdwxact_resolver_entry *) hash_seq_search(&status)) != NULL)
+		LaunchFdwXactResolver(entry->dbid, entry->serverid);
+
+	hash_destroy(launch_resolvers);
+	return true;
+}
+
+/* Register a background worker running the foreign transaction launcher */
+void
+FdwXactLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	if (max_foreign_xact_resolvers == 0)
+		return;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN,
+			 "foreign transaction launcher");
+	bgw.bgw_restart_time = 5;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+}
+
+bool
+IsFdwXactLauncher(void)
+{
+	return FdwXactResolverCtl->launcher_pid == MyProcPid;
+}
+
+/*
+ * Return the foreign transaction resolver running for dbid and serverid
+ */
+FdwXactResolver *
+FdwXactResolverFind(Oid dbid, Oid serverid)
+{
+	Assert(LWLockHeldByMe(FdwXactResolverLock));
+	for (int i = 0; i < max_foreign_xact_resolvers; i++)
+	{
+		FdwXactResolver *resolver = &FdwXactResolverCtl->resolvers[i];
+
+		if (resolver->in_use && resolver->dbid == dbid &&
+			resolver->serverid == serverid)
+			return resolver;
+	}
+
+	return NULL;
+}
+
+/*
+ * Stop the given foreign transaction resolver, and wait until it detaches
+ * from slot.
+ */
+void
+StopFdwXactResolver(Oid dbid, Oid serverid)
+{
+	FdwXactResolver *resolver;
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+
+	resolver = FdwXactResolverFind(dbid, serverid);
+
+	/* Not worker, nothing to do */
+	if (!resolver)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		return;
+	}
+
+	/* Found the resolver, terminate it ... */
+	kill(resolver->pid, SIGTERM);
+
+	/* ... and wait for it to die */
+	for (;;)
+	{
+		int			rc;
+
+		/* is it gone? */
+		if (!resolver->in_use)
+			break;
+
+		LWLockRelease(FdwXactResolverLock);
+
+		/* Wait a bit --- we don't expect to have to wait long. */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	}
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/*
+ * Stop the fdwxact resolver running for the given foreign server.
+ */
+Datum
+pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS)
+{
+	char	   *servername = text_to_cstring(PG_GETARG_TEXT_P(0));
+	ForeignServer *server;
+	FdwXactResolver *resolver;
+
+	/* Must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied to stop foreign transaction resolver")));
+
+	server = GetForeignServerByName(servername, false);
+
+	LWLockAcquire(FdwXactResolverLock, LW_SHARED);
+	resolver = FdwXactResolverFind(MyDatabaseId, server->serverid);
+	LWLockRelease(FdwXactResolverLock);
+
+	if (!resolver)
+		ereport(ERROR,
+				(errmsg("there is no running foreign transaction resolver process on server %s",
+						servername)));
+
+	StopFdwXactResolver(MyDatabaseId, server->serverid);
+	pfree(server);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/access/transam/fdwxact_resolver.c b/src/backend/access/transam/fdwxact_resolver.c
new file mode 100644
index 0000000000..587c4759ca
--- /dev/null
+++ b/src/backend/access/transam/fdwxact_resolver.c
@@ -0,0 +1,339 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver.c
+ *
+ * The foreign transaction resolver background worker resolves in-doubt
+ * foreign transactions, foreign transactions participate to a distributed
+ * transaction but aren't being processed anyone.  A resolver process is
+ * launched per foreign server by foreign transaction launcher.
+ *
+ * Normal termination is by SIGTERM, which instructs the resolver process
+ * to exit(0) at the next convenient moment. Emergency termination is by
+ * SIGQUIT; like any backend. The resolver process also terminate by timeouts
+ * only if there is no pending foreign transactions on the database waiting
+ * to be resolved.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/transam/fdwxact_resolver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+
+#include "access/fdwxact.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
+#include "access/resolver_internal.h"
+#include "access/transam.h"
+#include "access/twophase.h"
+#include "access/xact.h"
+#include "commands/dbcommands.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "storage/ipc.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/timeout.h"
+#include "utils/timestamp.h"
+
+/* max sleep time between cycles (3min) */
+#define DEFAULT_NAPTIME_PER_CYCLE 180000L
+
+/* GUC parameters */
+int			foreign_xact_resolution_retry_interval;
+int			foreign_xact_resolver_timeout = 60 * 1000;
+
+FdwXactResolverCtlData *FdwXactResolverCtl;
+
+static void FdwXactResolverLoop(void);
+static long FdwXactResolverComputeSleepTime(TimestampTz now);
+static void FdwXactResolverCheckTimeout(TimestampTz now);
+
+static void FdwXactResolverOnExit(int code, Datum arg);
+static void FdwXactResolverDetach(void);
+static void FdwXactResolverAttach(int slot);
+static void FdwXactResolverProcessInDoubtXacts(void);
+
+/* The list of currently holding FdwXact entries. */
+static List *heldFdwXactEntries = NIL;
+
+static TimestampTz last_resolution_time = -1;
+
+/*
+ * Detach the resolver and cleanup the resolver info.
+ */
+static void
+FdwXactResolverDetach(void)
+{
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	MyFdwXactResolver->pid = InvalidPid;
+	MyFdwXactResolver->in_use = false;
+	MyFdwXactResolver->dbid = InvalidOid;
+	MyFdwXactResolver->serverid = InvalidOid;
+
+	LWLockRelease(FdwXactResolverLock);
+
+	/*
+	 * Force to send remaining WAL statistics to the stats collector at
+	 * process exit.
+	 *
+	 * Since pgstat_send_wal is invoked with 'force' is false in main loop
+	 * to avoid overloading to the stats collector, there may exist unsent
+	 * stats counters for the WAL writer.
+	 */
+	pgstat_send_wal(true);
+}
+
+/*
+ * Cleanup up foreign transaction resolver info and releas the holding
+ * FdwXactState entries.
+ */
+static void
+FdwXactResolverOnExit(int code, Datum arg)
+{
+	ListCell   *lc;
+
+	FdwXactResolverDetach();
+
+	/* Release the held foreign transaction entries */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	foreach(lc, heldFdwXactEntries)
+	{
+		FdwXactState fdwxact = (FdwXactState) lfirst(lc);
+
+		if (fdwxact->valid && fdwxact->locking_backend == MyBackendId)
+			fdwxact->locking_backend = InvalidBackendId;
+	}
+	LWLockRelease(FdwXactLock);
+}
+
+/*
+ * Attach to a slot.
+ */
+static void
+FdwXactResolverAttach(int slot)
+{
+	/* Block concurrent access */
+	LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE);
+
+	Assert(slot >= 0 && slot < max_foreign_xact_resolvers);
+	MyFdwXactResolver = &FdwXactResolverCtl->resolvers[slot];
+
+	if (!MyFdwXactResolver->in_use)
+	{
+		LWLockRelease(FdwXactResolverLock);
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("foreign transaction resolver slot %d is empty, cannot attach",
+						slot)));
+	}
+
+	Assert(OidIsValid(MyFdwXactResolver->dbid));
+	Assert(OidIsValid(MyFdwXactResolver->serverid));
+
+	MyFdwXactResolver->pid = MyProcPid;
+	MyFdwXactResolver->latch = &MyProc->procLatch;
+
+	before_shmem_exit(FdwXactResolverOnExit, (Datum) 0);
+
+	LWLockRelease(FdwXactResolverLock);
+}
+
+/* Foreign transaction resolver entry point */
+void
+FdwXactResolverMain(Datum main_arg)
+{
+	int			slot = DatumGetInt32(main_arg);
+	char	   *datname;
+	ForeignServer *server;
+
+	/* Attach to a slot */
+	FdwXactResolverAttach(slot);
+
+	/* Establish signal handlers */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to our database */
+	BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0);
+
+	StartTransactionCommand();
+	datname = get_database_name(MyFdwXactResolver->dbid);
+	server = GetForeignServer(MyFdwXactResolver->serverid);
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for server \"%s\" on database \"%s\" has started",
+					server->servername, datname)));
+	pfree(datname);
+	pfree(server);
+	CommitTransactionCommand();
+
+	/* Initialize stats to a sanish value */
+	last_resolution_time = GetCurrentTimestamp();
+
+	/* Run the main loop */
+	FdwXactResolverLoop();
+
+	proc_exit(0);
+}
+
+/*
+ * Fdwxact resolver main loop
+ */
+static void
+FdwXactResolverLoop(void)
+{
+	/* Enter main loop */
+	for (;;)
+	{
+		TimestampTz now;
+		int			rc;
+		long		sleep_time = DEFAULT_NAPTIME_PER_CYCLE;
+
+		ResetLatch(MyLatch);
+
+		CHECK_FOR_INTERRUPTS();
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Resolve in-doubt transactions if any  */
+		FdwXactResolverProcessInDoubtXacts();
+
+		now = GetCurrentTimestamp();
+		FdwXactResolverCheckTimeout(now);
+		sleep_time = FdwXactResolverComputeSleepTime(now);
+
+		/* Send WAL statistics to the stats collector */
+		pgstat_send_wal(false);
+
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   sleep_time,
+					   WAIT_EVENT_FDWXACT_RESOLVER_MAIN);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+}
+
+/*
+ * Check whether there have been foreign transactions by the backend within
+ * foreign_xact_resolver_timeout and shutdown if not.
+ */
+static void
+FdwXactResolverCheckTimeout(TimestampTz now)
+{
+	TimestampTz timeout;
+	ForeignServer *server;
+
+	if (foreign_xact_resolver_timeout == 0)
+		return;
+
+	timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+										  foreign_xact_resolver_timeout);
+
+	if (now < timeout)
+		return;
+
+	/* Reached timeout, exit */
+	StartTransactionCommand();
+	server = GetForeignServer(MyFdwXactResolver->serverid);
+	ereport(LOG,
+			(errmsg("foreign transaction resolver for server \"%s\" on database \"%s\" will stop because the timeout",
+					server->servername,
+					get_database_name(MyDatabaseId))));
+	CommitTransactionCommand();
+	FdwXactResolverDetach();
+	proc_exit(0);
+}
+
+/*
+ * Compute how long we should sleep by the next cycle. We can sleep until the time
+ * out.
+ */
+static long
+FdwXactResolverComputeSleepTime(TimestampTz now)
+{
+	long		sleeptime = DEFAULT_NAPTIME_PER_CYCLE;
+
+	if (foreign_xact_resolver_timeout > 0)
+	{
+		TimestampTz timeout;
+
+		/* Compute relative time until wakeup. */
+		timeout = TimestampTzPlusMilliseconds(last_resolution_time,
+											  foreign_xact_resolver_timeout);
+		sleeptime = TimestampDifferenceMilliseconds(now, timeout);
+	}
+
+	return sleeptime;
+}
+
+bool
+IsFdwXactResolver(void)
+{
+	return MyFdwXactResolver != NULL;
+}
+
+/*
+ * Process in-doubt foreign transactions.
+ */
+static void
+FdwXactResolverProcessInDoubtXacts(void)
+{
+	ListCell   *lc;
+
+	Assert(heldFdwXactEntries == NIL);
+
+	/* Hold all in-doubt foreign transactions */
+	LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+	for (int i = 0; i < FdwXactCtl->num_xacts; i++)
+	{
+		FdwXactState fdwxact = FdwXactCtl->xacts[i];
+
+		if (fdwxact->valid &&
+			fdwxact->locking_backend == InvalidBackendId &&
+			fdwxact->data.dbid == MyFdwXactResolver->dbid &&
+			fdwxact->data.serverid == MyFdwXactResolver->serverid &&
+			!TwoPhaseExists(fdwxact->data.xid))
+		{
+			fdwxact->locking_backend = MyBackendId;
+			heldFdwXactEntries = lappend(heldFdwXactEntries, fdwxact);
+		}
+	}
+	LWLockRelease(FdwXactLock);
+
+	foreach(lc, heldFdwXactEntries)
+	{
+		FdwXactState fdwxact = (FdwXactState) lfirst(lc);
+
+		/*
+		 * Resolve one foreign transaction. ResolveOneFdwXact() releases and
+		 * removes FdwXactState entry after resolution.
+		 */
+		StartTransactionCommand();
+		ResolveOneFdwXact(fdwxact);
+		CommitTransactionCommand();
+	}
+
+	if (list_length(heldFdwXactEntries) > 0)
+		last_resolution_time = GetCurrentTimestamp();
+
+	list_free(heldFdwXactEntries);
+	heldFdwXactEntries = NIL;
+}
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b52..0a3f4b383f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -10,6 +10,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index f67d813c56..29980d56ac 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -77,6 +77,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -846,6 +847,34 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
 	return result;
 }
 
+/*
+ * TwoPhaseExists
+ *		Return true if there is a prepared transaction specified by XID
+ */
+bool
+TwoPhaseExists(TransactionId xid)
+{
+	int		i;
+	bool	found = false;
+
+	LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+	for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+	{
+		GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+		if (gxact->xid == xid)
+		{
+			found = true;
+			break;
+		}
+	}
+
+	LWLockRelease(TwoPhaseStateLock);
+
+	return found;
+}
+
 /*
  * TwoPhaseGetDummyBackendId
  *		Get the dummy backend ID for prepared transaction specified by XID
@@ -1556,6 +1585,12 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	/* Count the prepared xact as committed or aborted */
 	AtEOXact_PgStat(isCommit, false);
 
+	/*
+	 * If the prepared transaction was a part of a distributed transaction
+	 * notify a resolver process to handle it.
+	 */
+	FdwXactLaunchResolversForXid(xid);
+
 	/*
 	 * And now we can clean up any files we may have left.
 	 */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 1e00a3a98e..d007c97c75 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2131,7 +2131,7 @@ CommitTransaction(void)
 		AtEOXact_Parallel(true);
 
 	/* Call foreign transaction callbacks at pre-commit phase, if any */
-	AtEOXact_FdwXact(true, is_parallel_worker);
+	PreCommit_FdwXact(is_parallel_worker);
 
 	/* Shut down the deferred-trigger manager */
 	AfterTriggerEndXact(true);
@@ -2286,6 +2286,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_FdwXact(true, is_parallel_worker);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2559,6 +2560,7 @@ PrepareTransaction(void)
 	PostPrepare_Twophase();
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
+	AtEOXact_FdwXact(true, false);
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_Enum();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9cbca6392d..837e0f198a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/heaptoast.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
@@ -4670,6 +4671,7 @@ InitControlFile(uint64 sysidentifier)
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
 	ControlFile->wal_level = wal_level;
 	ControlFile->wal_log_hints = wal_log_hints;
@@ -6460,6 +6462,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_prepared_transactions",
 									 max_prepared_xacts,
 									 ControlFile->max_prepared_xacts);
+		RecoveryRequiresIntParameter("max_prepared_foreign_transactions",
+									 max_prepared_foreign_xacts,
+									 ControlFile->max_prepared_foreign_xacts);
 		RecoveryRequiresIntParameter("max_locks_per_transaction",
 									 max_locks_per_xact,
 									 ControlFile->max_locks_per_xact);
@@ -7009,14 +7014,15 @@ StartupXLOG(void)
 	restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI);
 
 	/*
-	 * Before running in recovery, scan pg_twophase and fill in its status to
-	 * be able to work on entries generated by redo.  Doing a scan before
-	 * taking any recovery action has the merit to discard any 2PC files that
-	 * are newer than the first record to replay, saving from any conflicts at
-	 * replay.  This avoids as well any subsequent scans when doing recovery
-	 * of the on-disk two-phase data.
+	 * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then
+	 * fill in its status to be able to work on entries generated by redo.
+	 * Doing a scan before taking any recovery action has the merit to discard
+	 * any state files that are newer than the first record to replay, saving
+	 * from any conflicts at replay.  This avoids as well any subsequent scans
+	 * when doing recovery of the on-disk two-phase or fdwxact data.
 	 */
 	restoreTwoPhaseData();
+	RestoreFdwXactData();
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
@@ -7218,7 +7224,10 @@ StartupXLOG(void)
 			InitRecoveryTransactionEnvironment();
 
 			if (wasShutdown)
+			{
 				oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+				oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
+			}
 			else
 				oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
@@ -7730,11 +7739,13 @@ StartupXLOG(void)
 	}
 
 	/*
-	 * Pre-scan prepared transactions to find out the range of XIDs present.
-	 * This information is not quite needed yet, but it is positioned here so
-	 * as potential problems are detected before any on-disk change is done.
+	 * Pre-scan prepared transactions and foreign prepared transacftions to find
+	 * out the range of XIDs present.  This information is not quite needed yet,
+	 * but it is positioned here so as potential problems are detected before any
+	 * on-disk change is done.
 	 */
 	oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
+	oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 	/*
 	 * Allow ordinary WAL segment creation before any exitArchiveRecovery(),
@@ -8066,8 +8077,12 @@ StartupXLOG(void)
 	TrimCLOG();
 	TrimMultiXact();
 
-	/* Reload shared-memory state for prepared transactions */
+	/*
+	 * Reload shared-memory state for prepared transactions and foreign
+	 * prepared transactions.
+	 */
 	RecoverPreparedTransactions();
+	RecoverFdwXacts();
 
 	/*
 	 * Shutdown the recovery environment. This must occur after
@@ -9419,6 +9434,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 
 	/* We deliberately delay 2PC checkpointing as long as possible */
 	CheckPointTwoPhase(checkPointRedo);
+	CheckPointFdwXacts(checkPointRedo);
 }
 
 /*
@@ -9955,6 +9971,7 @@ XLogReportParameters(void)
 		max_worker_processes != ControlFile->max_worker_processes ||
 		max_wal_senders != ControlFile->max_wal_senders ||
 		max_prepared_xacts != ControlFile->max_prepared_xacts ||
+		max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts ||
 		max_locks_per_xact != ControlFile->max_locks_per_xact ||
 		track_commit_timestamp != ControlFile->track_commit_timestamp)
 	{
@@ -9974,6 +9991,7 @@ XLogReportParameters(void)
 			xlrec.max_worker_processes = max_worker_processes;
 			xlrec.max_wal_senders = max_wal_senders;
 			xlrec.max_prepared_xacts = max_prepared_xacts;
+			xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 			xlrec.max_locks_per_xact = max_locks_per_xact;
 			xlrec.wal_level = wal_level;
 			xlrec.wal_log_hints = wal_log_hints;
@@ -9992,6 +10010,7 @@ XLogReportParameters(void)
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
 		ControlFile->max_prepared_xacts = max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = max_locks_per_xact;
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
@@ -10198,6 +10217,7 @@ xlog_redo(XLogReaderState *record)
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
+			oldestActiveXID = PrescanFdwXacts(oldestActiveXID);
 
 			/*
 			 * Construct a RunningTransactions snapshot representing a shut
@@ -10401,6 +10421,7 @@ xlog_redo(XLogReaderState *record)
 		ControlFile->max_worker_processes = xlrec.max_worker_processes;
 		ControlFile->max_wal_senders = xlrec.max_wal_senders;
 		ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
+		ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts;
 		ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
 		ControlFile->wal_level = xlrec.wal_level;
 		ControlFile->wal_log_hints = xlrec.wal_log_hints;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 0c37fc1d53..b951c03d20 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1470,6 +1470,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemovePublicationRelById(object->objectId);
 			break;
 
+		case OCLASS_USER_MAPPING:
+			RemoveUserMappingById(object->objectId);
+			break;
+
 		case OCLASS_CAST:
 		case OCLASS_COLLATION:
 		case OCLASS_CONVERSION:
@@ -1485,7 +1489,6 @@ doDeletion(const ObjectAddress *object, int flags)
 		case OCLASS_TSTEMPLATE:
 		case OCLASS_FDW:
 		case OCLASS_FOREIGN_SERVER:
-		case OCLASS_USER_MAPPING:
 		case OCLASS_DEFACL:
 		case OCLASS_EVENT_TRIGGER:
 		case OCLASS_PUBLICATION:
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 999d984068..f36f05f5cc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -402,6 +402,9 @@ CREATE VIEW pg_prepared_xacts AS
 CREATE VIEW pg_prepared_statements AS
     SELECT * FROM pg_prepared_statement() AS P;
 
+CREATE VIEW pg_foreign_xacts AS
+    SELECT * FROM pg_foreign_xacts() AS F;
+
 CREATE VIEW pg_seclabels AS
 SELECT
     l.objoid, l.classoid, l.objsubid,
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2b159b60eb..25ad64639f 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -23,6 +23,7 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
+#include "access/fdwxact.h"
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
@@ -821,6 +822,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	int			nslots,
 				nslots_active;
 	int			nsubscriptions;
+	int			nfdwxacts;
 
 	/*
 	 * Look up the target database's OID, and get exclusive lock on it. We
@@ -910,6 +912,18 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 								  "There are %d subscriptions.",
 								  nsubscriptions, nsubscriptions)));
 
+	/*
+	 * Also check if there is foreign transaction associated with the target
+	 * database.
+	 */
+	if ((nfdwxacts = CountFdwXactsForDB(db_id)) > 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_IN_USE),
+				 errmsg("database \"%s\" is being used by foreign transaction resolution",
+						dbname),
+				 errdetail_plural("There is %d foreign transaction.",
+								  "There are %d foreign transactions.",
+								  nfdwxacts, nfdwxacts)));
 
 	/*
 	 * Attempt to terminate all existing connections to the target database if
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index bc36311d38..533d5d59a1 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/table.h"
@@ -1060,6 +1062,49 @@ AlterForeignServer(AlterForeignServerStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop foreign server
+ */
+Oid
+RemoveForeignServer(DropForeignServerStmt *stmt)
+{
+	Oid serverid;
+	ObjectAddress object;
+
+	serverid = GetSysCacheOid1(FOREIGNSERVERNAME, Anum_pg_foreign_server_oid,
+							   CStringGetDatum(stmt->servername));
+
+	if (!OidIsValid(serverid))
+	{
+		if (!stmt->missing_ok)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_OBJECT),
+					 errmsg("server \"%s\" does not exist", stmt->servername)));
+
+		ereport(NOTICE,
+				errmsg("server \"%s\" does not exist, skipping",
+					   stmt->servername));
+		return InvalidOid;
+	}
+
+	/*
+	 * Only owner or a superuser can ALTER a SERVER.
+	 */
+	if (!pg_foreign_server_ownercheck(serverid, GetUserId()))
+		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FOREIGN_SERVER,
+					   stmt->servername);
+
+	/* Do the deletion */
+	object.classId = ForeignServerRelationId;
+	object.objectId = serverid;
+	object.objectSubId = 0;
+	performDeletion(&object, stmt->behavior, 0);
+
+	/* Stop the foreign transaction resolver immediately */
+	StopFdwXactResolver(MyDatabaseId, serverid);
+
+	return serverid;
+}
 
 /*
  * Common routine to check permission for user-mapping-related DDL
@@ -1307,6 +1352,37 @@ AlterUserMapping(AlterUserMappingStmt *stmt)
 	return address;
 }
 
+/*
+ * Drop the given user mapping
+ */
+void
+RemoveUserMappingById(Oid umid)
+{
+	HeapTuple	tp;
+	Relation	rel;
+
+	rel = table_open(UserMappingRelationId, RowExclusiveLock);
+
+	tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(umid));
+
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for user mapping %u", umid);
+
+	/*
+	 * We cannot drop the user mapping if there is a foreign prepared
+	 * transaction with this user mapping.
+	 */
+	if (CountFdwXactsForUserMapping(umid) > 0)
+		ereport(ERROR,
+				(errmsg("user mapping %u has unresolved prepared transaction",
+						umid)));
+
+	CatalogTupleDelete(rel, &tp->t_self);
+
+	ReleaseSysCache(tp);
+
+	table_close(rel, RowExclusiveLock);
+}
 
 /*
  * Drop user mapping
@@ -1374,6 +1450,7 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 
 	user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername);
 
+
 	/*
 	 * Do the deletion
 	 */
@@ -1386,7 +1463,6 @@ RemoveUserMapping(DropUserMappingStmt *stmt)
 	return umId;
 }
 
-
 /*
  * Create a foreign table
  * call after DefineRelation().
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index f8eb4fa215..6ce76b2aec 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -332,6 +332,12 @@ GetFdwRoutine(Oid fdwhandler)
 	Assert((routine->CommitForeignTransaction && routine->RollbackForeignTransaction) ||
 		   (!routine->CommitForeignTransaction && !routine->RollbackForeignTransaction));
 
+	/* FDW supporting prepare API must support also commit and rollback APIs */
+	Assert((routine->PrepareForeignTransaction &&
+			routine->CommitForeignTransaction &&
+			routine->RollbackForeignTransaction) ||
+		   !routine->PrepareForeignTransaction);
+
 	return routine;
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index bd87f23784..aad42485f5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4355,6 +4355,18 @@ _copyAlterForeignServerStmt(const AlterForeignServerStmt *from)
 	return newnode;
 }
 
+static DropForeignServerStmt *
+_copyDropForeignServerStmt(const DropForeignServerStmt *from)
+{
+	DropForeignServerStmt *newnode = makeNode(DropForeignServerStmt);
+
+	COPY_STRING_FIELD(servername);
+	COPY_SCALAR_FIELD(missing_ok);
+	COPY_SCALAR_FIELD(behavior);
+
+	return newnode;
+}
+
 static CreateUserMappingStmt *
 _copyCreateUserMappingStmt(const CreateUserMappingStmt *from)
 {
@@ -5604,6 +5616,9 @@ copyObjectImpl(const void *from)
 		case T_AlterForeignServerStmt:
 			retval = _copyAlterForeignServerStmt(from);
 			break;
+		case T_DropForeignServerStmt:
+			retval = _copyDropForeignServerStmt(from);
+			break;
 		case T_CreateUserMappingStmt:
 			retval = _copyCreateUserMappingStmt(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index dba3e6b31e..e0b24d1147 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1977,6 +1977,17 @@ _equalAlterForeignServerStmt(const AlterForeignServerStmt *a, const AlterForeign
 	return true;
 }
 
+static bool
+_equalDropForeignServerStmt(const DropForeignServerStmt *a,
+							const DropForeignServerStmt *b)
+{
+	COMPARE_STRING_FIELD(servername);
+	COMPARE_SCALAR_FIELD(missing_ok);
+	COMPARE_SCALAR_FIELD(behavior);
+
+	return true;
+}
+
 static bool
 _equalCreateUserMappingStmt(const CreateUserMappingStmt *a, const CreateUserMappingStmt *b)
 {
@@ -3595,6 +3606,9 @@ equal(const void *a, const void *b)
 		case T_AlterForeignServerStmt:
 			retval = _equalAlterForeignServerStmt(a, b);
 			break;
+		case T_DropForeignServerStmt:
+			retval = _equalDropForeignServerStmt(a, b);
+			break;
 		case T_CreateUserMappingStmt:
 			retval = _equalCreateUserMappingStmt(a, b);
 			break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index eb24195438..85feb0a600 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -281,7 +281,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		CreateAssertionStmt CreateTransformStmt CreateTrigStmt CreateEventTrigStmt
 		CreateUserStmt CreateUserMappingStmt CreateRoleStmt CreatePolicyStmt
 		CreatedbStmt DeclareCursorStmt DefineStmt DeleteStmt DiscardStmt DoStmt
-		DropOpClassStmt DropOpFamilyStmt DropStmt
+		DropOpClassStmt DropOpFamilyStmt DropForeignServerStmt DropStmt
 		DropCastStmt DropRoleStmt
 		DropdbStmt DropTableSpaceStmt
 		DropTransformStmt
@@ -980,6 +980,7 @@ stmt:
 			| DiscardStmt
 			| DoStmt
 			| DropCastStmt
+			| DropForeignServerStmt
 			| DropOpClassStmt
 			| DropOpFamilyStmt
 			| DropOwnedStmt
@@ -6339,6 +6340,7 @@ object_type_name:
 			drop_type_name							{ $$ = $1; }
 			| DATABASE								{ $$ = OBJECT_DATABASE; }
 			| ROLE									{ $$ = OBJECT_ROLE; }
+			| SERVER								{ $$ = OBJECT_FOREIGN_SERVER; }
 			| SUBSCRIPTION							{ $$ = OBJECT_SUBSCRIPTION; }
 			| TABLESPACE							{ $$ = OBJECT_TABLESPACE; }
 		;
@@ -6351,7 +6353,6 @@ drop_type_name:
 			| opt_procedural LANGUAGE				{ $$ = OBJECT_LANGUAGE; }
 			| PUBLICATION							{ $$ = OBJECT_PUBLICATION; }
 			| SCHEMA								{ $$ = OBJECT_SCHEMA; }
-			| SERVER								{ $$ = OBJECT_FOREIGN_SERVER; }
 		;
 
 /* object types attached to a table */
@@ -9798,6 +9799,30 @@ DropSubscriptionStmt: DROP SUBSCRIPTION name opt_drop_behavior
 				}
 		;
 
+/*****************************************************************************
+ *
+ * DROP SERVER [ IF EXISTS ] name
+ *
+ *****************************************************************************/
+
+DropForeignServerStmt: DROP SERVER name opt_drop_behavior
+				{
+					DropForeignServerStmt *n = makeNode(DropForeignServerStmt);
+					n->servername = $3;
+					n->missing_ok = false;
+					n->behavior = $4;
+					$$ = (Node *) n;
+				}
+				|  DROP SERVER IF_P EXISTS name opt_drop_behavior
+				{
+					DropForeignServerStmt *n = makeNode(DropForeignServerStmt);
+					n->servername = $5;
+					n->missing_ok = true;
+					n->behavior = $6;
+					$$ = (Node *) n;
+				}
+		;
+
 /*****************************************************************************
  *
  *		QUERY:	Define Rewrite Rule
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index c40410d73e..89d0219cad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -13,6 +13,8 @@
 #include "postgres.h"
 
 #include "access/parallel.h"
+#include "access/fdwxact_resolver.h"
+#include "access/fdwxact_launcher.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -128,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"FdwXactResolverMain", FdwXactResolverMain
+	},
+	{
+		"FdwXactLauncherMain", FdwXactLauncherMain
 	}
 };
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5a050898fe..d5afe2382c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 #include <pthread.h>
 #endif
 
+#include "access/fdwxact_launcher.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "bootstrap/bootstrap.h"
@@ -925,6 +926,9 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 				(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers <= 0)
+		ereport(ERROR,
+				(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers > 0")));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
@@ -990,12 +994,13 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher and foreign transaction launcher.  Since
+	 * it registers a background worker, it needs to be called before
+	 * InitializeMaxBackends(), and it's probably a good idea to call it
+	 * before any modules had chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
+	FdwXactLauncherRegister();
 
 	/*
 	 * process any libraries that should be preloaded at postmaster start
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 453efc51e1..eac334c5f6 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -179,6 +179,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 		case RM_COMMIT_TS_ID:
 		case RM_REPLORIGIN_ID:
 		case RM_GENERIC_ID:
+		case RM_FDWXACT_ID:
 			/* just deal with xid, and done */
 			ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
 									buf.origptr);
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index bdbc9ef844..c4f4ca8c65 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -143,13 +143,16 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
  * represents a commit record.  If it doesn't, then we wait only for the WAL
  * to be flushed if synchronous_commit is set to the higher level of
  * remote_apply, because only commit records provide apply feedback.
+ *
+ * Return true if the waits is cancaled by an interruption.
  */
-void
+bool
 SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 {
 	char	   *new_status = NULL;
 	const char *old_status;
 	int			mode;
+	bool		canceled = false;
 
 	/*
 	 * This should be called while holding interrupts during a transaction
@@ -173,7 +176,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 */
 	if (!SyncRepRequested() ||
 		!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
-		return;
+		return false;
 
 	/* Cap the level for anything other than commit to remote flush only. */
 	if (commit)
@@ -199,7 +202,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		lsn <= WalSndCtl->lsn[mode])
 	{
 		LWLockRelease(SyncRepLock);
-		return;
+		return false;
 	}
 
 	/*
@@ -269,6 +272,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -285,6 +289,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 					(errmsg("canceling wait for synchronous replication due to user request"),
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 
@@ -304,6 +309,7 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 			ProcDiePending = true;
 			whereToSendOutput = DestNone;
 			SyncRepCancelWait();
+			canceled = true;
 			break;
 		}
 	}
@@ -327,6 +333,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		set_ps_display(new_status);
 		pfree(new_status);
 	}
+
+	return canceled;
 }
 
 /*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..47179c37a4 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,6 +16,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
+#include "access/fdwxact_launcher.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
@@ -150,6 +152,8 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, FdwXactShmemSize());
+		size = add_size(size, FdwXactLauncherShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -269,6 +273,8 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	FdwXactShmemInit();
+	FdwXactLauncherShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 793df973b4..417355f795 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -96,6 +96,8 @@ typedef struct ProcArrayStruct
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
+	/* local transaction id of oldest unresolved distributed transaction */
+	TransactionId fdwxact_unresolved_xmin;
 
 	/* indexes into allProcs[], has PROCARRAY_MAXPROCS entries */
 	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
@@ -187,11 +189,13 @@ typedef struct ComputeXidHorizonsResult
 	FullTransactionId latest_completed;
 
 	/*
-	 * The same for procArray->replication_slot_xmin and.
-	 * procArray->replication_slot_catalog_xmin.
+	 * The same for procArray->replication_slot_xmin,
+	 * procArray->replication_slot_catalog_xmin, and
+	 * procArray->fdwxact_unresolved_xmin.
 	 */
 	TransactionId slot_xmin;
 	TransactionId slot_catalog_xmin;
+	TransactionId fdwxact_unresolved_xmin;
 
 	/*
 	 * Oldest xid that any backend might still consider running. This needs to
@@ -210,8 +214,9 @@ typedef struct ComputeXidHorizonsResult
 	 * Oldest xid for which deleted tuples need to be retained in shared
 	 * tables.
 	 *
-	 * This includes the effects of replication slots. If that's not desired,
-	 * look at shared_oldest_nonremovable_raw;
+	 * This includes the effects of replication slots as unresolved
+	 * foreign transactions. If that's not desired, look at
+	 * shared_oldest_nonremovable_raw;
 	 */
 	TransactionId shared_oldest_nonremovable;
 
@@ -418,6 +423,7 @@ CreateSharedProcArray(void)
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
+		procArray->fdwxact_unresolved_xmin = InvalidTransactionId;
 		ShmemVariableCache->xactCompletionCount = 1;
 	}
 
@@ -1741,6 +1747,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+	h->fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin;
 
 	for (int index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1888,6 +1895,15 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	h->data_oldest_nonremovable =
 		TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
 
+	/*
+	 * Check whether there are unresolved distributed transaction requiring
+	 * an older xmin.
+	 */
+	h->shared_oldest_nonremovable =
+		TransactionIdOlder(h->shared_oldest_nonremovable, h->fdwxact_unresolved_xmin);
+	h->data_oldest_nonremovable =
+		TransactionIdOlder(h->data_oldest_nonremovable, h->fdwxact_unresolved_xmin);
+
 	/*
 	 * The only difference between catalog / data horizons is that the slot's
 	 * catalog xmin is applied to the catalog one (so catalogs can be accessed
@@ -1947,6 +1963,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	Assert(!TransactionIdIsValid(h->slot_catalog_xmin) ||
 		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
 										 h->slot_catalog_xmin));
+	Assert(!TransactionIdIsValid(h->fdwxact_unresolved_xmin) ||
+		   TransactionIdPrecedesOrEquals(h->oldest_considered_running,
+										 h->fdwxact_unresolved_xmin));
 
 	/* update approximate horizons with the computed horizons */
 	GlobalVisUpdateApply(h);
@@ -3859,6 +3878,21 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * ProcArraySetFdwXactUnresolvedXmin
+ *
+ * Install limits to future computations of the xmin horizon to prevent
+ * vacuum clog from affected transactions needed by resolving distributed
+ * transaction.
+ */
+void
+ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	procArray->fdwxact_unresolved_xmin = xmin;
+	LWLockRelease(ProcArrayLock);
+}
+
 /*
  * XidCacheRemoveRunningXids
  *
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..a297c746cd 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,5 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+FdwXactLock							48
+FdwXactResolverLock					49
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..929061a6bb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -36,6 +36,8 @@
 #include "rusagestub.h"
 #endif
 
+#include "access/fdwxact_launcher.h"
+#include "access/fdwxact_resolver.h"
 #include "access/parallel.h"
 #include "access/printtup.h"
 #include "access/xact.h"
@@ -3164,6 +3166,20 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
+		else if (IsFdwXactResolver())
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("terminating foreign transaction resolver due to administrator command")));
+		else if (IsFdwXactLauncher())
+		{
+			ereport(DEBUG1,
+					(errmsg_internal("foreign transaction launcher shutting down")));
+			/*
+			 * The foreign transaction launcher can be stopped at any time.
+			 * Use exit status 1 so the background worker is restarted.
+			 */
+			proc_exit(1);
+		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 7a2da9dab4..95cf27f7cf 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -197,6 +197,7 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
 		case T_CreateUserMappingStmt:
 		case T_CreatedbStmt:
 		case T_DefineStmt:
+		case T_DropForeignServerStmt:
 		case T_DropOwnedStmt:
 		case T_DropRoleStmt:
 		case T_DropStmt:
@@ -1567,6 +1568,10 @@ ProcessUtilitySlow(ParseState *pstate,
 				address = AlterForeignServer((AlterForeignServerStmt *) parsetree);
 				break;
 
+			case T_DropForeignServerStmt:
+				RemoveForeignServer((DropForeignServerStmt *) parsetree);
+				break;
+
 			case T_CreateUserMappingStmt:
 				address = CreateUserMapping((CreateUserMappingStmt *) parsetree);
 				break;
@@ -2473,6 +2478,10 @@ CreateCommandTag(Node *parsetree)
 			tag = CMDTAG_ALTER_SERVER;
 			break;
 
+		case T_DropForeignServerStmt:
+			tag = CMDTAG_DROP_SERVER;
+			break;
+
 		case T_CreateUserMappingStmt:
 			tag = CMDTAG_CREATE_USER_MAPPING;
 			break;
@@ -2577,9 +2586,6 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_FDW:
 					tag = CMDTAG_DROP_FOREIGN_DATA_WRAPPER;
 					break;
-				case OBJECT_FOREIGN_SERVER:
-					tag = CMDTAG_DROP_SERVER;
-					break;
 				case OBJECT_OPCLASS:
 					tag = CMDTAG_DROP_OPERATOR_CLASS;
 					break;
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 6baf67740c..429a50c591 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -726,6 +726,21 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_LOGICAL_SUBXACT_WRITE:
 			event_name = "LogicalSubxactWrite";
 			break;
+		case WAIT_EVENT_FDWXACT_FILE_READ:
+			event_name = "FdwXactFileRead";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_SYNC:
+			event_name = "FdwXactFileSync";
+			break;
+		case WAIT_EVENT_FDWXACT_FILE_WRITE:
+			event_name = "FdwXactFileWrite";
+			break;
+		case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN:
+			event_name = "FdwXactLauncherMain";
+			break;
+		case WAIT_EVENT_FDWXACT_RESOLVER_MAIN:
+			event_name = "FdwXactResolverMain";
+			break;
 
 			/* no default case, so that compiler will warn */
 	}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 480e8cd199..65815ec047 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include <unistd.h>
 
 #include "access/commit_ts.h"
+#include "access/fdwxact.h"
 #include "access/gin.h"
 #include "access/rmgr.h"
 #include "access/tableam.h"
@@ -802,6 +803,10 @@ const char *const config_group_names[] =
 	gettext_noop("Client Connection Defaults / Other Defaults"),
 	/* LOCK_MANAGEMENT */
 	gettext_noop("Lock Management"),
+	/* FOREIGN_TRANSACTION */
+	gettext_noop("Foreign Transaction"),
+	/* FOREIGN_TRANSACTION_RESOLVER */
+	gettext_noop("Foreign Transaction / Resolver"),
 	/* COMPAT_OPTIONS_PREVIOUS */
 	gettext_noop("Version and Platform Compatibility / Previous PostgreSQL Versions"),
 	/* COMPAT_OPTIONS_CLIENT */
@@ -2538,6 +2543,49 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."),
+			NULL
+		},
+		&max_prepared_foreign_xacts,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolver_timeout", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+			gettext_noop("Sets the maximum time to wait for foreign transaction resolution."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&foreign_xact_resolver_timeout,
+		60 * 1000, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Maximum number of foreign transaction resolution processes."),
+			NULL
+		},
+		&max_foreign_xact_resolvers,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FOREIGN_TRANSACTION_RESOLVER,
+		 gettext_noop("Sets the time to wait before retrying to resolve foreign transaction "
+					  "after a failed attempt."),
+		 NULL,
+		 GUC_UNIT_MS
+		},
+		&foreign_xact_resolution_retry_interval,
+		5000, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 #ifdef LOCK_DEBUG
 	{
 		{"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b696abfe54..9cc35c7109 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -132,6 +132,8 @@
 #temp_buffers = 8MB			# min 800kB
 #max_prepared_transactions = 0		# zero disables the feature
 					# (change requires restart)
+#max_prepared_foreign_transactions = 0	# zero disables the feature
+					# (change requires restart)
 # Caution: it is not advisable to set max_prepared_transactions nonzero unless
 # you actively intend to use prepared transactions.
 #work_mem = 4MB				# min 64kB
@@ -744,6 +746,18 @@
 #max_pred_locks_per_page = 2            # min 0
 
 
+#------------------------------------------------------------------------------
+# FOREIGN TRANSACTION
+#------------------------------------------------------------------------------
+
+#max_foreign_transaction_resolvers = 0		# max number of resolver process
+						# (change requires restart)
+#foreign_transaction_resolver_timeout = 60s	# in milliseconds; 0 disables
+#foreign_transaction_resolution_retry_interval = 5s	# time to wait before
+							# retrying to resolve
+							# foreign transactions
+							# after a failed attempt
+
 #------------------------------------------------------------------------------
 # VERSION AND PLATFORM COMPATIBILITY
 #------------------------------------------------------------------------------
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 152d21e88b..735e4084b3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -207,6 +207,7 @@ static const char *const subdirs[] = {
 	"pg_wal/archive_status",
 	"pg_commit_ts",
 	"pg_dynshmem",
+	"pg_fdwxact",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f911f98d94..49d47c2ee7 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -296,6 +296,8 @@ main(int argc, char *argv[])
 		   ControlFile->max_wal_senders);
 	printf(_("max_prepared_xacts setting:           %d\n"),
 		   ControlFile->max_prepared_xacts);
+	printf(_("max_prepared_foreign_transactions setting:   %d\n"),
+		   ControlFile->max_prepared_foreign_xacts);
 	printf(_("max_locks_per_xact setting:           %d\n"),
 		   ControlFile->max_locks_per_xact);
 	printf(_("track_commit_timestamp setting:       %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 805dafef07..dd70a0f8a2 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -710,6 +710,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
@@ -914,6 +915,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
+	ControlFile.max_prepared_foreign_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
 	/* The control file gets flushed here. */
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..b616cea347 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -11,6 +11,7 @@
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/fdwxact_xlog.h"
 #include "access/generic_xlog.h"
 #include "access/ginxlog.h"
 #include "access/gistxlog.h"
diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h
index 1d4a285c75..85854864b9 100644
--- a/src/include/access/fdwxact.h
+++ b/src/include/access/fdwxact.h
@@ -1,7 +1,7 @@
 /*
  * fdwxact.h
  *
- * PostgreSQL global transaction manager
+ * PostgreSQL foreign transaction manager definitions
  *
  * Portions Copyright (c) 2021, PostgreSQL Global Development Group
  *
@@ -11,13 +11,80 @@
 #define FDWXACT_H
 
 #include "access/xact.h"
+#include "access/fdwxact_xlog.h"
 #include "foreign/foreign.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/s_lock.h"
 
 /* Flag passed to FDW transaction management APIs */
 #define FDWXACT_FLAG_ONEPHASE		0x01	/* transaction can commit/rollback
 											 * without preparation */
 #define FDWXACT_FLAG_PARALLEL_WORKER	0x02	/* is parallel worker? */
 
+/* Enum to track the status of foreign transaction */
+typedef enum
+{
+	FDWXACT_STATUS_INVALID = 0,
+	FDWXACT_STATUS_PREPARING,	/* foreign transaction is being prepared */
+	FDWXACT_STATUS_PREPARED,	/* foreign transaction is prepared */
+	FDWXACT_STATUS_COMMITTING,	/* foreign prepared transaction is committed */
+	FDWXACT_STATUS_ABORTING		/* foreign prepared transaction is aborted */
+} FdwXactStatus;
+
+/*
+ * Shared memory state of a single foreign transaction.
+ */
+typedef struct FdwXactStateData *FdwXactState;
+typedef struct FdwXactStateData
+{
+	FdwXactState		fdwxact_free_next;	/* Next free FdwXactState entry */
+
+	/* Information relevant with foreign transaction */
+	FdwXactStateOnDiskData data;
+
+	/* Foreign transaction status */
+	FdwXactStatus status;
+	slock_t		mutex;			/* protect the above field */
+
+	/*
+	 * Note that we need to keep track of two LSNs for each FdwXactState. We keep
+	 * track of the start LSN because this is the address we must use to read
+	 * state data back from WAL when committing a FdwXactState. We keep track of
+	 * the end LSN because that is the LSN we need to wait for prior to
+	 * commit.
+	 */
+	XLogRecPtr	insert_start_lsn;	/* XLOG offset of inserting this entry
+									 * start */
+	XLogRecPtr	insert_end_lsn; /* XLOG offset of inserting this entry end */
+
+	bool		valid;			/* has the entry been complete and written to
+								 * file? */
+	BackendId	locking_backend;	/* backend currently working on the fdw xact */
+	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		inredo;			/* true if entry was added via xlog_redo */
+} FdwXactStateData;
+
+/*
+ * Shared memory layout for maintaining foreign prepared transaction entries.
+ * Adding or removing FdwXactState entry needs to hold FdwXactLock in exclusive mode,
+ * and iterating fdwXacts needs that in shared mode.
+ */
+typedef struct
+{
+	/* Head of linked list of free FdwXactStateData structs */
+	FdwXactState	free_fdwxacts;
+
+	/* Number of valid foreign transaction entries */
+	int	num_xacts;
+
+	/* Upto max_prepared_foreign_xacts entries in the array */
+	FdwXactState	xacts[FLEXIBLE_ARRAY_MEMBER];	/* Variable length array */
+} FdwXactCtlData;
+
+/* Pointer to the shared memory holding the foreign transactions data */
+FdwXactCtlData *FdwXactCtl;
+
 /* State data for foreign transaction resolution, passed to FDW callbacks */
 typedef struct FdwXactInfo
 {
@@ -25,10 +92,29 @@ typedef struct FdwXactInfo
 	UserMapping		*usermapping;
 
 	int	flags;			/* OR of FDWXACT_FLAG_xx flags */
+	char   *identifier;
 } FdwXactInfo;
 
+/* GUC parameters */
+extern int	max_prepared_foreign_xacts;
+extern int	max_foreign_xact_resolvers;
+extern int	foreign_xact_resolution_retry_interval;
+extern int	foreign_xact_resolver_timeout;
+
 /* Function declarations */
+extern void PreCommit_FdwXact(bool is_parallel_worker);
 extern void AtEOXact_FdwXact(bool isCommit, bool is_parallel_worker);
+extern Size FdwXactShmemSize(void);
+extern void FdwXactShmemInit(void);
 extern void AtPrepare_FdwXact(void);
+extern bool FdwXactIsForeignTwophaseCommitRequired(void);
+extern int CountFdwXactsForUserMapping(Oid umid);
+extern int CountFdwXactsForDB(Oid dbid);
+extern void FdwXactLaunchResolversForXid(TransactionId xid);
+extern void CheckPointFdwXacts(XLogRecPtr redo_horizon);
+extern void ResolveOneFdwXact(FdwXactState fdwxact);
+extern void RestoreFdwXactData(void);
+extern void RecoverFdwXacts(void);
+extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid);
 
 #endif /* FDWXACT_H */
diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h
new file mode 100644
index 0000000000..6381aa6e55
--- /dev/null
+++ b/src/include/access/fdwxact_launcher.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_launcher.h
+ *	  PostgreSQL foreign transaction launcher definitions
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_launcher.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FDWXACT_LAUNCHER_H
+#define FDWXACT_LAUNCHER_H
+
+#include "access/fdwxact.h"
+#include "access/resolver_internal.h"
+
+extern void FdwXactLauncherRegister(void);
+extern void FdwXactLauncherMain(Datum main_arg);
+extern void RequestToLaunchFdwXactResolver(void);
+extern void LaunchOrWakeupFdwXactResolver(List *serveroids_orig);
+extern Size FdwXactLauncherShmemSize(void);
+extern void FdwXactLauncherShmemInit(void);
+extern bool IsFdwXactLauncher(void);
+extern void StopFdwXactResolver(Oid dbid, Oid serverid);
+
+
+#endif							/* FDWXACT_LAUNCHER_H */
diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h
new file mode 100644
index 0000000000..9301ada5bb
--- /dev/null
+++ b/src/include/access/fdwxact_resolver.h
@@ -0,0 +1,22 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_resolver.h
+ *	  PostgreSQL foreign transaction resolver definitions
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_resolver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_RESOLVER_H
+#define FDWXACT_RESOLVER_H
+
+#include "access/fdwxact.h"
+
+extern void FdwXactResolverMain(Datum main_arg);
+extern bool IsFdwXactResolver(void);
+
+extern int	foreign_xact_resolver_timeout;
+
+#endif							/* FDWXACT_RESOLVER_H */
diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h
new file mode 100644
index 0000000000..a1a10b71b2
--- /dev/null
+++ b/src/include/access/fdwxact_xlog.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdwxact_xlog.h
+ *	  Foreign transaction XLOG definitions.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/fdwxact_xlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef FDWXACT_XLOG_H
+#define FDWXACT_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+/* Info types for logs related to FDW transactions */
+#define XLOG_FDWXACT_INSERT	0x00
+#define XLOG_FDWXACT_REMOVE	0x10
+
+/* Maximum length of the prepared transaction id, borrowed from twophase.c */
+#define FDWXACT_ID_MAX_LEN 200
+
+/*
+ * On disk file structure, also used to WAL
+ */
+typedef struct
+{
+	TransactionId xid;
+	Oid		dbid;
+	Oid		umid;
+	Oid		serverid;
+	Oid		owner;
+	char	identifier[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */
+} FdwXactStateOnDiskData;
+
+typedef struct xl_fdwxact_remove
+{
+	TransactionId xid;
+	Oid		umid;
+	bool	force;
+} xl_fdwxact_remove;
+
+extern void fdwxact_redo(XLogReaderState *record);
+extern void fdwxact_desc(StringInfo buf, XLogReaderState *record);
+extern const char *fdwxact_identify(uint8 info);
+
+#endif							/* FDWXACT_XLOG_H */
diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h
new file mode 100644
index 0000000000..89fc6a5653
--- /dev/null
+++ b/src/include/access/resolver_internal.h
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * resolver_internal.h
+ *	  Internal headers shared by fdwxact resolvers.
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/resolver_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RESOLVER_INTERNAL_H
+#define RESOLVER_INTERNAL_H
+
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "utils/timestamp.h"
+
+/*
+ * Each foreign transaction resolver has a FdwXactResolver struct in
+ * shared memory.  This struct is protected by FdwXactResolverLock.
+ */
+typedef struct FdwXactResolver
+{
+	pid_t		pid;			/* this resolver's PID, or 0 if not active */
+	Oid			dbid;
+	Oid			serverid;
+
+	/* Indicates if this slot is used of free */
+	bool		in_use;
+
+	/* Protect shared variables shown above */
+	slock_t		mutex;
+
+	/*
+	 * Pointer to the resolver's patch. Used by backends to wake up this
+	 * resolver when it has work to do. NULL if the resolver isn't active.
+	 */
+	Latch	   *latch;
+} FdwXactResolver;
+
+/* There is one FdwXactResolverCtlData struct for the whole database cluster */
+typedef struct FdwXactResolverCtlData
+{
+	/* Supervisor process and latch */
+	pid_t		launcher_pid;
+	Latch	   *launcher_latch;
+
+	FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER];
+} FdwXactResolverCtlData;
+#define SizeOfFdwXactResolverCtlData \
+	(offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver))
+
+extern FdwXactResolverCtlData *FdwXactResolverCtl;
+extern FdwXactResolver *MyFdwXactResolver;
+
+#endif							/* RESOLVER_INTERNAL_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index f582cf535f..5ab1f57212 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
 PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
 PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
 PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_FDWXACT_ID, "Fdw Transaction", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL)
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 91786da784..3d35f89ae0 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void);
 
 extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held);
 extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held);
+extern bool	TwoPhaseExists(TransactionId xid);
 
 extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
 										 TimestampTz prepared_at,
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 3b5eceff65..21fa835687 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -236,6 +236,7 @@ typedef struct xl_parameter_change
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	int			wal_level;
 	bool		wal_log_hints;
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..5673ec7299 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -179,6 +179,7 @@ typedef struct ControlFileData
 	int			max_worker_processes;
 	int			max_wal_senders;
 	int			max_prepared_xacts;
+	int			max_prepared_foreign_xacts;
 	int			max_locks_per_xact;
 	bool		track_commit_timestamp;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fde251fa4f..7569f4cdbd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6123,6 +6123,24 @@
   proargnames => '{type,object_names,object_args,classid,objid,objsubid}',
   prosrc => 'pg_get_object_address' },
 
+{ oid => '9706', descr => 'view foreign transactions',
+  proname => 'pg_foreign_xacts', prorows => '100', proretset => 't',
+  provolatile => 'v', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{xid,oid,oid,oid,text,text,int4}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{xid,umid,ownerid,dbid,state,identifier,locker_pid}',
+  prosrc => 'pg_foreign_xacts' },
+{ oid => '9707', descr => 'remove foreign transaction without resolution',
+  proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid',
+  proargnames => '{xid,umid}',
+  prosrc => 'pg_remove_foreign_xact' },
+{ oid => '9708', descr => 'resolve one foreign transaction',
+  proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'xid oid',
+  proargnames => '{xid,umid}',
+  prosrc => 'pg_resolve_foreign_xact' },
+
 { oid => '2079', descr => 'is table visible in search path?',
   proname => 'pg_table_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' },
@@ -6243,6 +6261,11 @@
   proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn',
   prosrc => 'pg_walfile_name' },
 
+{ oid => '9709',
+  descr => 'stop a running foreign transaction resolver process',
+  proname => 'pg_stop_foreign_xact_resolver', provolatile => 'v', prorettype => 'bool',
+  proargtypes => 'text', prosrc => 'pg_stop_foreign_xact_resolver'},
+
 { oid => '3165', descr => 'difference in bytes, given two wal locations',
   proname => 'pg_wal_lsn_diff', prorettype => 'numeric',
   proargtypes => 'pg_lsn pg_lsn', prosrc => 'pg_wal_lsn_diff' },
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 42bf1c7519..a38a451afd 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -125,9 +125,11 @@ extern ObjectAddress CreateForeignDataWrapper(CreateFdwStmt *stmt);
 extern ObjectAddress AlterForeignDataWrapper(AlterFdwStmt *stmt);
 extern ObjectAddress CreateForeignServer(CreateForeignServerStmt *stmt);
 extern ObjectAddress AlterForeignServer(AlterForeignServerStmt *stmt);
+extern Oid RemoveForeignServer(DropForeignServerStmt *stmt);
 extern ObjectAddress CreateUserMapping(CreateUserMappingStmt *stmt);
 extern ObjectAddress AlterUserMapping(AlterUserMappingStmt *stmt);
 extern Oid	RemoveUserMapping(DropUserMappingStmt *stmt);
+extern void RemoveUserMappingById(Oid umid);
 extern void CreateForeignTable(CreateForeignTableStmt *stmt, Oid relid);
 extern void ImportForeignSchema(ImportForeignSchemaStmt *stmt);
 extern Datum transformGenericOptions(Oid catalogId,
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c3539a4d73..5338f4f2d9 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -192,6 +192,7 @@ typedef void (*ForeignAsyncConfigureWait_function) (AsyncRequest *areq);
 
 typedef void (*ForeignAsyncNotify_function) (AsyncRequest *areq);
 
+typedef void (*PrepareForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*CommitForeignTransaction_function) (FdwXactInfo *finfo);
 typedef void (*RollbackForeignTransaction_function) (FdwXactInfo *finfo);
 
@@ -287,6 +288,7 @@ typedef struct FdwRoutine
 	/* Support functions for transaction management */
 	CommitForeignTransaction_function CommitForeignTransaction;
 	RollbackForeignTransaction_function RollbackForeignTransaction;
+	PrepareForeignTransaction_function PrepareForeignTransaction;
 } FdwRoutine;
 
 
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index d9e417bcd7..3bea64a44b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -403,6 +403,7 @@ typedef enum NodeTag
 	T_AlterFdwStmt,
 	T_CreateForeignServerStmt,
 	T_AlterForeignServerStmt,
+	T_DropForeignServerStmt,
 	T_CreateUserMappingStmt,
 	T_AlterUserMappingStmt,
 	T_DropUserMappingStmt,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index def9651b34..4ecd453689 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2387,7 +2387,7 @@ typedef struct AlterFdwStmt
 } AlterFdwStmt;
 
 /* ----------------------
- *		Create/Alter FOREIGN SERVER Statements
+ *		Create/Alter/Drop FOREIGN SERVER Statements
  * ----------------------
  */
 
@@ -2411,6 +2411,14 @@ typedef struct AlterForeignServerStmt
 	bool		has_version;	/* version specified */
 } AlterForeignServerStmt;
 
+typedef struct DropForeignServerStmt
+{
+	NodeTag		type;
+	char	   *servername;		/* server name */
+	bool		missing_ok;		/* Skip error if missing? */
+	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
+} DropForeignServerStmt;
+
 /* ----------------------
  *		Create FOREIGN TABLE Statement
  * ----------------------
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 4266afde8b..476fe8c688 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -82,7 +82,7 @@ extern char *syncrep_parse_error_msg;
 extern char *SyncRepStandbyNames;
 
 /* called by user backend */
-extern void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
+extern bool SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
 
 /* called at backend exit */
 extern void SyncRepCleanupAtProcExit(void);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index b01fa52139..300a4cf5b6 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -93,5 +93,6 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 
 extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 											TransactionId *catalog_xmin);
+extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin);
 
 #endif							/* PROCARRAY_H */
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 6b40f1eeb8..35802eac86 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -89,6 +89,8 @@ enum config_group
 	CLIENT_CONN_PRELOAD,
 	CLIENT_CONN_OTHER,
 	LOCK_MANAGEMENT,
+	FOREIGN_TRANSACTION,
+	FOREIGN_TRANSACTION_RESOLVER,
 	COMPAT_OPTIONS_PREVIOUS,
 	COMPAT_OPTIONS_CLIENT,
 	ERROR_HANDLING_OPTIONS,
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6c6ec2e711..915c174d1c 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -224,7 +224,12 @@ typedef enum
 	WAIT_EVENT_LOGICAL_CHANGES_READ,
 	WAIT_EVENT_LOGICAL_CHANGES_WRITE,
 	WAIT_EVENT_LOGICAL_SUBXACT_READ,
-	WAIT_EVENT_LOGICAL_SUBXACT_WRITE
+	WAIT_EVENT_LOGICAL_SUBXACT_WRITE,
+	WAIT_EVENT_FDWXACT_FILE_READ,
+	WAIT_EVENT_FDWXACT_FILE_SYNC,
+	WAIT_EVENT_FDWXACT_FILE_WRITE,
+	WAIT_EVENT_FDWXACT_LAUNCHER_MAIN,
+	WAIT_EVENT_FDWXACT_RESOLVER_MAIN
 } WaitEventIO;
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e5ab11275d..443a6715df 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1352,6 +1352,14 @@ pg_file_settings| SELECT a.sourcefile,
     a.applied,
     a.error
    FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error);
+pg_foreign_xacts| SELECT f.xid,
+    f.umid,
+    f.ownerid,
+    f.dbid,
+    f.state,
+    f.identifier,
+    f.locker_pid
+   FROM pg_foreign_xacts() f(xid, umid, ownerid, dbid, state, identifier, locker_pid);
 pg_group| SELECT pg_authid.rolname AS groname,
     pg_authid.oid AS grosysid,
     ARRAY( SELECT pg_auth_members.member
-- 
2.24.3 (Apple Git-128)

#277Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Masahiko Sawada (#276)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/06/30 10:05, Masahiko Sawada wrote:

On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

Hi Jamison-san, sawada-san,

Thanks for testing!

FWIF, I tested using pgbench with "--rate=" option to know the server
can execute transactions with stable throughput. As sawada-san said,
the latest patch resolved second phase of 2PC asynchronously. So,
it's difficult to control the stable throughput without "--rate=" option.

I also worried what I should do when the error happened because to increase
"max_prepared_foreign_transaction" doesn't work. Since too overloading may
show the error, is it better to add the case to the HINT message?

BTW, if sawada-san already develop to run the resolver processes in parallel,
why don't you measure performance improvement? Although Robert-san,
Tunakawa-san and so on are discussing what architecture is best, one
discussion point is that there is a performance risk if adopting asynchronous
approach. If we have promising solutions, I think we can make the discussion
forward.

Yeah, if we can asynchronously resolve the distributed transactions
without worrying about max_prepared_foreign_transaction error, it
would be good. But we will need synchronous resolution at some point.
I think we at least need to discuss it at this point.

I've attached the new version patch that incorporates the comments
from Fujii-san and Ikeda-san I got so far. We launch a resolver
process per foreign server, committing prepared foreign transactions
on foreign servers in parallel. To get a better performance based on
the current architecture, we can have multiple resolver processes per
foreign server but it seems not easy to tune it in practice. Perhaps
is it better if we simply have a pool of resolver processes and we
assign a resolver process to the resolution of one distributed
transaction one by one? That way, we need to launch resolver processes
as many as the concurrent backends using 2PC.

Thanks for updating the patches.

I have tested in my local laptop and summary is the following.

(1) The latest patch(v37) can improve throughput by 1.5 times compared to v36.

Although I expected it improves by 2.0 times because the workload is that one
transaction access two remote servers... I think the reason is that the disk
is bottleneck and I couldn't prepare disks for each postgresql servers. If I
could, I think the performance can be improved by 2.0 times.

(2) The latest patch(v37) throughput of foreign_twophase_commit = required is
about 36% compared to the case if foreign_twophase_commit = disabled.

Although the throughput is improved, the absolute performance is not good. It
may be the fate of 2PC. I think the reason is that the number of WAL writes is
much increase and, the disk writes in my laptop is the bottleneck. I want to
know the result testing in richer environments if someone can do so.

(3) The latest patch(v37) has no overhead if foreign_twophase_commit =
disabled. On the contrary, the performance improved by 3%. It may be within
the margin of error.

The test detail is following.

# condition

* 1 coordinator and 3 foreign servers

* 4 instance shared one ssd disk.

* one transaction queries different two foreign servers.

``` fxact_update.pgbench
\set id random(1, 1000000)

\set partnum 3
\set p1 random(1, :partnum)
\set p2 ((:p1 + 1) % :partnum) + 1

BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

* pgbench generates load. I increased ${RATE} little by little until "maximum
number of foreign transactions reached" error happens.

```
pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180
```

* parameters
max_prepared_transactions = 100
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 4

# test source code patterns

1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required).
2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required).
3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled).
4. 2595e039 without 2pc patches(v37).

# results

1. tps = 241.8000TPS
latency average = 10.413ms

2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.)
latency average = 15.427ms

3. tps = 987.372220 ( by 1.03% compared to 4. )
latency average = 8.102ms

4. tps = 955.984574
latency average = 8.368ms

The disk is the bottleneck in my environment because disk util is almost 100%
in every pattern. If disks for each instance can be prepared, I think we can
expect more performance improvements.

In my understanding, there are three improvement idea. First is that to make
the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
PREPARED" remote servers in bulk. Third is to stop syncing the WAL
remove_fdwxact() after resolving is done, which I addressed in the mail sent
at June 3rd, 13:56. Since third idea is not yet discussed, there may
be my misunderstanding.

Yes, those optimizations are promising. On the other hand, they could
introduce complexity to the code and APIs. I'd like to keep the first
version simple. I think we need to discuss them at this stage but can
leave the implementation of both parallel execution and batch
execution as future improvements.

OK, I agree.

For the third idea, I think the implementation was wrong; it removes
the state file then flushes the WAL record. I think these should be
performed in the reverse order. Otherwise, FdwXactState entry could be
left on the standby if the server crashes between them. I might be
missing something though.

Oh, I see. I think you're right though what you wanted to say is that it
flushes the WAL records then removes the state file. If "COMMIT/ABORT
PREPARED" statements execute in bulk, it seems enough to sync the wal only
once, then remove all related state files.

BTW, I tested the binary building with -O2, and I got the following warnings.
It's needed to be fixed.

```
fdwxact.c: In function 'PrepareAllFdwXacts':
fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this
function [-Wmaybe-uninitialized]
897 | canceled = SyncRepWaitForLSN(flush_lsn, false);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#278r.takahashi_2@fujitsu.com
r.takahashi_2@fujitsu.com
In reply to: Masahiro Ikeda (#277)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi,

I'm interested in this patch and I also run the same test with Ikeda-san's fxact_update.pgbench.
In my environment (poor spec VM), the result is following.

* foreign_twophase_commit = disabled
363tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said)
13tps

I analyzed the bottleneck using pstack and strace.
I noticed that the open() during "COMMIT PREPARED" command is very slow.

In my environment the latency of the "COMMIT PREPARED" is 16ms.
(On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms)
In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms.
Therefore, open() is the bottleneck of "COMMIT PREPARED".
Furthermore, I noticed that the backend process almost always open the same wal segment file.

In the current patch, the backend process on foreign server which is associated with the connection from the resolver process always run "COMMIT PREPARED" command.
Therefore, the wal segment file of the current "COMMIT PREPARED" command probably be the same with the previous "COMMIT PREPARED" command.

In order to improve the performance of the resolver process, I think it is useful to skip closing wal segment file during the "COMMIT PREPARED" and reuse file descriptor.
Is it possible?

Regards,
Ryohei Takahashi

#279k.jamison@fujitsu.com
k.jamison@fujitsu.com
In reply to: Masahiko Sawada (#276)
RE: Transactions involving multiple postgres foreign servers, take 2

On Wed, June 30, 2021 10:06 (GMT+9), Masahiko Sawada wrote:

I've attached the new version patch that incorporates the comments from
Fujii-san and Ikeda-san I got so far. We launch a resolver process per foreign
server, committing prepared foreign transactions on foreign servers in parallel.

Hi Sawada-san,
Thank you for the latest set of patches.
I've noticed from cfbot that the regression test failed, and I also could not compile it.

============== running regression test queries ==============
test test_fdwxact ... FAILED 21 ms
============== shutting down postmaster ==============
======================
1 of 1 tests failed.
======================

To get a better performance based on the current architecture, we can have
multiple resolver processes per foreign server but it seems not easy to tune it
in practice. Perhaps is it better if we simply have a pool of resolver processes
and we assign a resolver process to the resolution of one distributed
transaction one by one? That way, we need to launch resolver processes as
many as the concurrent backends using 2PC.

Yes, finding the right value to tune of of max_foreign_prepared_transactions and
max_prepared_transactions seem difficult. If we set the number of resolver
process to number of concurrent backends using 2PC, how do we determine
the value of max_foreign_transaction_resolvers? It might be good to set some
statistics to judge the value, then we can compare the performance from the V37
version.

-
Also, this is a bit of side topic, and I know we've been discussing how to
improve/fix the resolver process bottlenecks, and Takahashi-san provided
the details above thread where V37 has problems. (I am joining the testing too.)

I am not sure if this has been brought up before because of the years of
thread. But I think that there is a need to consider the need to prevent for the
resolver process from an infinite wait loop of resolving a prepared foreign
transaction. Currently, when a crashed foreign server is recovered during
resolution retries, the information is recovered from WAL and files,
and the resolver process resumes the foreign transaction resolution.
However, what if we cannot (or intentionally do not want to) recover the
crashed server after a long time?

An idea is to make the resolver process to automatically stop after some
maximum number of retries.
We can call the parameter as foreign_transaction_resolution_max_retry_count.
There may be a better name, but I followed the pattern from your patch.

The server downtime can be estimated considering the proposed parameter
foreign_transaction_resolution_retry_interval (default 10s) from the
patch set.
In addition, according to docs, "a foreign server using the postgres_fdw
foreign data wrapper can have the same options that libpq accepts in
connection strings", so the connect_timeout set during CREATE SERVER can
also affect it.

Example:
CREATE SERVER's connect_timeout setting = 5s
foreign_transaction_resolution_retry_interval = 10s
foreign_transaction_resolution_max_retry_count = 3

Estimated total time before resolver stops:
= (5s) * (3 + 1) + (10s) * (3) = 50 s

00s: 1st connect start
05s: 1st connect timeout
(retry interval)
15s: 2nd connect start (1st retry)
20s: 2nd connect timeout
(retry interval)
30s: 3rd connect start (2nd retry)
35s: 3rd connect timeout
(retry interval)
45s: 4th connect start (3rd retry)
50s: 4th connect timeout
(resolver process stops)

Then the resolver process will not wait indefinitely and will stop after
some time depending on the setting of the above parameters.
This could be the automatic implementation of pg_stop_foreign_xact_resolver.
Assuming that resolver is stopped, then the crashed server is
decided to be restored, the user can then execute pg_resolve_foreign_xact().
Do you think the idea is feasible and we can add it as part of the patch sets?

Regards,
Kirk Jamison

#280Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#276)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/06/30 10:05, Masahiko Sawada wrote:

I've attached the new version patch that incorporates the comments
from Fujii-san and Ikeda-san I got so far.

Thanks for updating the patches!

I'm now reading 0001 and 0002 patches and wondering if we can commit them
at first because they just provide independent basic mechanism for
foreign transaction management.

One question regarding them is; Why did we add new API only for "top" foreign
transaction? Even with those patches, old API (CallSubXactCallbacks) is still
being used for foreign subtransaction and xact_depth is still being managed
in postgres_fdw layer (not PostgreSQL core). Is this intentional?
Sorry if this was already discussed before.

As far as I read the code, keep using old API for foreign subtransaction doesn't
cause any actual bug. But it's just strange and half-baked to manage top and
sub transaction in the differenet layer and to use old and new API for them.

OTOH, I'm afraid that adding new (not-essential) API for foreign subtransaction
might increase the code complexity unnecessarily.

Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#281Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiro Ikeda (#277)
Re: Transactions involving multiple postgres foreign servers, take 2

Sorry for the late reply.

On Mon, Jul 5, 2021 at 3:29 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2021/06/30 10:05, Masahiko Sawada wrote:

On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

Hi Jamison-san, sawada-san,

Thanks for testing!

FWIF, I tested using pgbench with "--rate=" option to know the server
can execute transactions with stable throughput. As sawada-san said,
the latest patch resolved second phase of 2PC asynchronously. So,
it's difficult to control the stable throughput without "--rate=" option.

I also worried what I should do when the error happened because to increase
"max_prepared_foreign_transaction" doesn't work. Since too overloading may
show the error, is it better to add the case to the HINT message?

BTW, if sawada-san already develop to run the resolver processes in parallel,
why don't you measure performance improvement? Although Robert-san,
Tunakawa-san and so on are discussing what architecture is best, one
discussion point is that there is a performance risk if adopting asynchronous
approach. If we have promising solutions, I think we can make the discussion
forward.

Yeah, if we can asynchronously resolve the distributed transactions
without worrying about max_prepared_foreign_transaction error, it
would be good. But we will need synchronous resolution at some point.
I think we at least need to discuss it at this point.

I've attached the new version patch that incorporates the comments
from Fujii-san and Ikeda-san I got so far. We launch a resolver
process per foreign server, committing prepared foreign transactions
on foreign servers in parallel. To get a better performance based on
the current architecture, we can have multiple resolver processes per
foreign server but it seems not easy to tune it in practice. Perhaps
is it better if we simply have a pool of resolver processes and we
assign a resolver process to the resolution of one distributed
transaction one by one? That way, we need to launch resolver processes
as many as the concurrent backends using 2PC.

Thanks for updating the patches.

I have tested in my local laptop and summary is the following.

Thank you for testing!

(1) The latest patch(v37) can improve throughput by 1.5 times compared to v36.

Although I expected it improves by 2.0 times because the workload is that one
transaction access two remote servers... I think the reason is that the disk
is bottleneck and I couldn't prepare disks for each postgresql servers. If I
could, I think the performance can be improved by 2.0 times.

(2) The latest patch(v37) throughput of foreign_twophase_commit = required is
about 36% compared to the case if foreign_twophase_commit = disabled.

Although the throughput is improved, the absolute performance is not good. It
may be the fate of 2PC. I think the reason is that the number of WAL writes is
much increase and, the disk writes in my laptop is the bottleneck. I want to
know the result testing in richer environments if someone can do so.

(3) The latest patch(v37) has no overhead if foreign_twophase_commit =
disabled. On the contrary, the performance improved by 3%. It may be within
the margin of error.

The test detail is following.

# condition

* 1 coordinator and 3 foreign servers

* 4 instance shared one ssd disk.

* one transaction queries different two foreign servers.

``` fxact_update.pgbench
\set id random(1, 1000000)

\set partnum 3
\set p1 random(1, :partnum)
\set p2 ((:p1 + 1) % :partnum) + 1

BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

* pgbench generates load. I increased ${RATE} little by little until "maximum
number of foreign transactions reached" error happens.

```
pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180
```

* parameters
max_prepared_transactions = 100
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 4

# test source code patterns

1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required).
2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required).
3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled).
4. 2595e039 without 2pc patches(v37).

# results

1. tps = 241.8000TPS
latency average = 10.413ms

2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.)
latency average = 15.427ms

3. tps = 987.372220 ( by 1.03% compared to 4. )
latency average = 8.102ms

4. tps = 955.984574
latency average = 8.368ms

The disk is the bottleneck in my environment because disk util is almost 100%
in every pattern. If disks for each instance can be prepared, I think we can
expect more performance improvements.

It seems still not good performance. I'll also test using your script.

In my understanding, there are three improvement idea. First is that to make
the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
PREPARED" remote servers in bulk. Third is to stop syncing the WAL
remove_fdwxact() after resolving is done, which I addressed in the mail sent
at June 3rd, 13:56. Since third idea is not yet discussed, there may
be my misunderstanding.

Yes, those optimizations are promising. On the other hand, they could
introduce complexity to the code and APIs. I'd like to keep the first
version simple. I think we need to discuss them at this stage but can
leave the implementation of both parallel execution and batch
execution as future improvements.

OK, I agree.

For the third idea, I think the implementation was wrong; it removes
the state file then flushes the WAL record. I think these should be
performed in the reverse order. Otherwise, FdwXactState entry could be
left on the standby if the server crashes between them. I might be
missing something though.

Oh, I see. I think you're right though what you wanted to say is that it
flushes the WAL records then removes the state file. If "COMMIT/ABORT
PREPARED" statements execute in bulk, it seems enough to sync the wal only
once, then remove all related state files.

BTW, I tested the binary building with -O2, and I got the following warnings.
It's needed to be fixed.

```
fdwxact.c: In function 'PrepareAllFdwXacts':
fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this
function [-Wmaybe-uninitialized]
897 | canceled = SyncRepWaitForLSN(flush_lsn, false);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Thank you for the report. I'll fix it in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#282Masahiko Sawada
sawada.mshk@gmail.com
In reply to: r.takahashi_2@fujitsu.com (#278)
Re: Transactions involving multiple postgres foreign servers, take 2

Sorry for the late reply.

On Tue, Jul 6, 2021 at 3:15 PM r.takahashi_2@fujitsu.com
<r.takahashi_2@fujitsu.com> wrote:

Hi,

I'm interested in this patch and I also run the same test with Ikeda-san's fxact_update.pgbench.

Thank you for testing!

In my environment (poor spec VM), the result is following.

* foreign_twophase_commit = disabled
363tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said)
13tps

I analyzed the bottleneck using pstack and strace.
I noticed that the open() during "COMMIT PREPARED" command is very slow.

In my environment the latency of the "COMMIT PREPARED" is 16ms.
(On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms)
In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms.
Therefore, open() is the bottleneck of "COMMIT PREPARED".
Furthermore, I noticed that the backend process almost always open the same wal segment file.

In the current patch, the backend process on foreign server which is associated with the connection from the resolver process always run "COMMIT PREPARED" command.
Therefore, the wal segment file of the current "COMMIT PREPARED" command probably be the same with the previous "COMMIT PREPARED" command.

In order to improve the performance of the resolver process, I think it is useful to skip closing wal segment file during the "COMMIT PREPARED" and reuse file descriptor.
Is it possible?

Not sure but it might be possible to keep holding an xlogreader for
reading PREPARE WAL records even after the transaction commit. But I
wonder how much open() for wal segment file accounts for the total
execution time of 2PC. 2PC requires 2 network round trips for each
participant. For example, if it took 500ms in total, we would not get
benefits much from the point of view of 2PC performance even if we
improved it from 14ms to 1ms.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#283Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#280)
Re: Transactions involving multiple postgres foreign servers, take 2

On Fri, Jul 9, 2021 at 3:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/06/30 10:05, Masahiko Sawada wrote:

I've attached the new version patch that incorporates the comments
from Fujii-san and Ikeda-san I got so far.

Thanks for updating the patches!

I'm now reading 0001 and 0002 patches and wondering if we can commit them
at first because they just provide independent basic mechanism for
foreign transaction management.

One question regarding them is; Why did we add new API only for "top" foreign
transaction? Even with those patches, old API (CallSubXactCallbacks) is still
being used for foreign subtransaction and xact_depth is still being managed
in postgres_fdw layer (not PostgreSQL core). Is this intentional?

Yes, it's not needed for 2PC support and I was also concerned to add
complexity to the core by adding new API for subscriptions that are
not necessarily necessary for 2PC.

As far as I read the code, keep using old API for foreign subtransaction doesn't
cause any actual bug. But it's just strange and half-baked to manage top and
sub transaction in the differenet layer and to use old and new API for them.

That's a valid concern. I'm really not sure what we should do here but
I guess that even if we want to support subscriptions we have another
API dedicated for subtransaction commit and rollback.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#284r.takahashi_2@fujitsu.com
r.takahashi_2@fujitsu.com
In reply to: Masahiko Sawada (#282)
1 attachment(s)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi Sawada-san,

Thank you for your reply.

Not sure but it might be possible to keep holding an xlogreader for
reading PREPARE WAL records even after the transaction commit. But I
wonder how much open() for wal segment file accounts for the total
execution time of 2PC. 2PC requires 2 network round trips for each
participant. For example, if it took 500ms in total, we would not get
benefits much from the point of view of 2PC performance even if we
improved it from 14ms to 1ms.

I made the patch based on your advice and re-run the test on the new machine.
(The attached patch is just for test purpose.)

* foreign_twophase_commit = disabled
2686tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said)
311tps

* foreign_twophase_commit = required with attached patch (It is not necessary to set -R ${RATE})
2057tps

This indicate that if we can reduce the number of times to open() wal segment file during "COMMIT PREPARED", the performance can be improved.

This patch can skip closing wal segment file, but I don't know when we should close.
One idea is to close when the wal segment file is recycled, but it seems difficult for backend process to do so.

BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is proposed.
I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', ... 'prep_n'".
If we can open wal segment file during bulk COMMIT PREPARED, we can not only reduce the times of communication, but also reduce the times of open() wal segment file.

Regards,
Ryohei Takahashi

Attachments:

Hold_xlogreader.patchapplication/octet-stream; name=Hold_xlogreader.patchDownload
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 29980d56ac..f67244183b 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1355,15 +1355,25 @@ static void
 XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
 {
 	XLogRecord *record;
-	XLogReaderState *xlogreader;
+	static XLogReaderState *xlogreader;
 	char	   *errormsg;
 	TimeLineID	save_currtli = ThisTimeLineID;
 
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-									XL_ROUTINE(.page_read = &read_local_xlog_page,
-											   .segment_open = &wal_segment_open,
-											   .segment_close = &wal_segment_close),
-									NULL);
+	if (!xlogreader)
+	{
+		/*
+		 * Create xlogreader on TopMemoryContext to prevent open the same
+		 * wal segment many times.
+		 */
+		MemoryContext oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+										XL_ROUTINE(.page_read = &read_local_xlog_page,
+												   .segment_open = &wal_segment_open,
+												   .segment_close = &wal_segment_close),
+										NULL);
+		MemoryContextSwitchTo(oldctx);
+	}
+
 	if (!xlogreader)
 		ereport(ERROR,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
@@ -1398,8 +1408,6 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
 
 	*buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader));
 	memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader));
-
-	XLogReaderFree(xlogreader);
 }
 
 
#285Ranier Vilela
ranier.vf@gmail.com
In reply to: r.takahashi_2@fujitsu.com (#284)
Re: Transactions involving multiple postgres foreign servers, take 2

Em ter., 13 de jul. de 2021 às 01:14, r.takahashi_2@fujitsu.com <
r.takahashi_2@fujitsu.com> escreveu:

Hi Sawada-san,

Thank you for your reply.

Not sure but it might be possible to keep holding an xlogreader for
reading PREPARE WAL records even after the transaction commit. But I
wonder how much open() for wal segment file accounts for the total
execution time of 2PC. 2PC requires 2 network round trips for each
participant. For example, if it took 500ms in total, we would not get
benefits much from the point of view of 2PC performance even if we
improved it from 14ms to 1ms.

I made the patch based on your advice and re-run the test on the new
machine.
(The attached patch is just for test purpose.)

Wouldn't it be better to explicitly initialize the pointer with NULL?
I think it's common in Postgres.

static XLogReaderState *xlogreader = NULL;

* foreign_twophase_commit = disabled
2686tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as
Ikeda-san said)
311tps

* foreign_twophase_commit = required with attached patch (It is not
necessary to set -R ${RATE})
2057tps

Nice results.

regards,
Ranier Vilela

#286r.takahashi_2@fujitsu.com
r.takahashi_2@fujitsu.com
In reply to: Ranier Vilela (#285)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi,

Wouldn't it be better to explicitly initialize the pointer with NULL?

Thank you for your advice.
You are correct.

Anyway, I fixed it and re-run the performance test, it of course does not affect tps.

Regards,
Ryohei Takahashi

#287Masahiko Sawada
sawada.mshk@gmail.com
In reply to: r.takahashi_2@fujitsu.com (#284)
Re: Transactions involving multiple postgres foreign servers, take 2

On Tue, Jul 13, 2021 at 1:14 PM r.takahashi_2@fujitsu.com
<r.takahashi_2@fujitsu.com> wrote:

Hi Sawada-san,

Thank you for your reply.

Not sure but it might be possible to keep holding an xlogreader for
reading PREPARE WAL records even after the transaction commit. But I
wonder how much open() for wal segment file accounts for the total
execution time of 2PC. 2PC requires 2 network round trips for each
participant. For example, if it took 500ms in total, we would not get
benefits much from the point of view of 2PC performance even if we
improved it from 14ms to 1ms.

I made the patch based on your advice and re-run the test on the new machine.
(The attached patch is just for test purpose.)

Thank you for testing!

* foreign_twophase_commit = disabled
2686tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said)
311tps

* foreign_twophase_commit = required with attached patch (It is not necessary to set -R ${RATE})
2057tps

Nice improvement!

BTW did you test on the local? That is, the foreign servers are
located on the same machine?

This indicate that if we can reduce the number of times to open() wal segment file during "COMMIT PREPARED", the performance can be improved.

This patch can skip closing wal segment file, but I don't know when we should close.
One idea is to close when the wal segment file is recycled, but it seems difficult for backend process to do so.

I guess it would be better to start a new thread for this improvement.
This idea helps not only 2PC case but also improves the
COMMIT/ROLLBACK PREPARED performance itself. Rather than thinking it
tied with this patch, I think it's good if we can discuss this patch
separately and it gets committed alone.

BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is proposed.
I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', ... 'prep_n'".
If we can open wal segment file during bulk COMMIT PREPARED, we can not only reduce the times of communication, but also reduce the times of open() wal segment file.

What if we successfully committed 'prep_1' but an error happened
during committing another one for some reason (i.g., corrupted 2PC
state file, OOM etc)? We might return an error to the client but have
already committed 'prep_1'.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#288r.takahashi_2@fujitsu.com
r.takahashi_2@fujitsu.com
In reply to: Masahiko Sawada (#287)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi Sawada-san,

Thank you for your reply.

BTW did you test on the local? That is, the foreign servers are
located on the same machine?

Yes, I tested on the local since I cannot prepare the good network now.

I guess it would be better to start a new thread for this improvement.

Thank you for your advice.
I started a new thread [1]/messages/by-id/OS0PR01MB56828019B25CD5190AB6093282129@OS0PR01MB5682.jpnprd01.prod.outlook.com.

What if we successfully committed 'prep_1' but an error happened
during committing another one for some reason (i.g., corrupted 2PC
state file, OOM etc)? We might return an error to the client but have
already committed 'prep_1'.

Sorry, I don't have good idea now.
I imagined the command returns the list of the transaction id which ends with error.

[1]: /messages/by-id/OS0PR01MB56828019B25CD5190AB6093282129@OS0PR01MB5682.jpnprd01.prod.outlook.com
/messages/by-id/OS0PR01MB56828019B25CD5190AB6093282129@OS0PR01MB5682.jpnprd01.prod.outlook.com

Regards,
Ryohei Takahashi

#289Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#283)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/07/09 22:44, Masahiko Sawada wrote:

On Fri, Jul 9, 2021 at 3:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

As far as I read the code, keep using old API for foreign subtransaction doesn't
cause any actual bug. But it's just strange and half-baked to manage top and
sub transaction in the differenet layer and to use old and new API for them.

That's a valid concern. I'm really not sure what we should do here but
I guess that even if we want to support subscriptions we have another
API dedicated for subtransaction commit and rollback.

Ok, so if possible I will write POC patch for new API for foreign subtransactions
and consider whether it's enough simple that we can commit into core or not.

+#define FDWXACT_FLAG_PARALLEL_WORKER 0x02 /* is parallel worker? */

This implies that parallel workers may execute PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to the foreign server for atomic commit?
If so, what happens if the PREPARE TRANSACTION that one of
parallel workers issues fails? In this case, not only that parallel worker
but also other parallel workers and the leader should rollback the transaction
at all. That is, they should issue ROLLBACK PREPARED to the foreign servers.
This issue was already handled and addressed in the patches?

This seems not actual issue if only postgres_fdw is used. Because postgres_fdw
doesn't have IsForeignScanParallelSafe API. Right? But what about other FDW?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#290k.jamison@fujitsu.com
k.jamison@fujitsu.com
In reply to: Fujii Masao (#289)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi Sawada-san,

I noticed that this thread and its set of patches have been marked with "Returned with Feedback" by yourself.
I find the feature (atomic commit for foreign transactions) very useful
and it will pave the road for having a distributed transaction management in Postgres.
Although we have not arrived at consensus at which approach is best,
there were significant reviews and major patch changes in the past 2 years.
By any chance, do you have any plans to continue this from where you left off?

Regards,
Kirk Jamison

#291Masahiko Sawada
sawada.mshk@gmail.com
In reply to: k.jamison@fujitsu.com (#290)
Re: Transactions involving multiple postgres foreign servers, take 2

Hi,

On Tue, Oct 5, 2021 at 9:56 AM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:

Hi Sawada-san,

I noticed that this thread and its set of patches have been marked with "Returned with Feedback" by yourself.
I find the feature (atomic commit for foreign transactions) very useful
and it will pave the road for having a distributed transaction management in Postgres.
Although we have not arrived at consensus at which approach is best,
there were significant reviews and major patch changes in the past 2 years.
By any chance, do you have any plans to continue this from where you left off?

As I could not reply to the review comments from Fujii-san for almost
three months, I don't have enough time to move this project forward at
least for now. That's why I marked this patch as RWF. I’d like to
continue working on this project in my spare time but I know this is
not a project that can be completed by using only my spare time. If
someone wants to work on this project, I’d appreciate it and am happy
to help.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#292Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Masahiko Sawada (#291)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/10/05 10:38, Masahiko Sawada wrote:

Hi,

On Tue, Oct 5, 2021 at 9:56 AM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:

Hi Sawada-san,

I noticed that this thread and its set of patches have been marked with "Returned with Feedback" by yourself.
I find the feature (atomic commit for foreign transactions) very useful
and it will pave the road for having a distributed transaction management in Postgres.
Although we have not arrived at consensus at which approach is best,
there were significant reviews and major patch changes in the past 2 years.
By any chance, do you have any plans to continue this from where you left off?

As I could not reply to the review comments from Fujii-san for almost
three months, I don't have enough time to move this project forward at
least for now. That's why I marked this patch as RWF. I’d like to
continue working on this project in my spare time but I know this is
not a project that can be completed by using only my spare time. If
someone wants to work on this project, I’d appreciate it and am happy
to help.

Probably it's time to rethink the approach. The patch introduces
foreign transaction manager into PostgreSQL core, but as far as
I review the patch, its changes look overkill and too complicated.
This seems one of reasons why we could not have yet committed
the feature even after several years.

Another concern about the approach of the patch is that it needs
to change a backend so that it additionally waits for replication
during commit phase before executing PREPARE TRANSACTION
to foreign servers. Which would decrease the performance
during commit phase furthermore.

So I wonder if it's worth revisiting the original approach, i.e.,
add the atomic commit into postgres_fdw. One disadvantage of
this is that it supports atomic commit only between foreign
PostgreSQL servers, not other various data resources like MySQL.
But I'm not sure if we really want to do atomic commit between
various FDWs. Maybe supporting only postgres_fdw is enough
for most users. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#293k.jamison@fujitsu.com
k.jamison@fujitsu.com
In reply to: Fujii Masao (#292)
RE: Transactions involving multiple postgres foreign servers, take 2

Hi Fujii-san and Sawada-san,

Thank you very much for your replies.

I noticed that this thread and its set of patches have been marked with

"Returned with Feedback" by yourself.

I find the feature (atomic commit for foreign transactions) very
useful and it will pave the road for having a distributed transaction

management in Postgres.

Although we have not arrived at consensus at which approach is best,
there were significant reviews and major patch changes in the past 2 years.
By any chance, do you have any plans to continue this from where you left off?

As I could not reply to the review comments from Fujii-san for almost
three months, I don't have enough time to move this project forward at
least for now. That's why I marked this patch as RWF. I’d like to
continue working on this project in my spare time but I know this is
not a project that can be completed by using only my spare time. If
someone wants to work on this project, I’d appreciate it and am happy
to help.

Probably it's time to rethink the approach. The patch introduces foreign
transaction manager into PostgreSQL core, but as far as I review the patch, its
changes look overkill and too complicated.
This seems one of reasons why we could not have yet committed the feature even
after several years.

Another concern about the approach of the patch is that it needs to change a
backend so that it additionally waits for replication during commit phase before
executing PREPARE TRANSACTION to foreign servers. Which would decrease the
performance during commit phase furthermore.

So I wonder if it's worth revisiting the original approach, i.e., add the atomic
commit into postgres_fdw. One disadvantage of this is that it supports atomic
commit only between foreign PostgreSQL servers, not other various data
resources like MySQL.
But I'm not sure if we really want to do atomic commit between various FDWs.
Maybe supporting only postgres_fdw is enough for most users. Thought?

The intention of Sawada-san's patch is grand although would be very much helpful
because it accommodates possible future support of atomic commit for
various types of FDWs. However, it's difficult to get the agreement altogether,
as other reviewers also point out the performance of commit. Another point is that
how it should work when we also implement atomic visibility (which is another
topic for distributed transactions but worth considering).
That said, if we're going to initially support it on postgres_fdw, which is simpler
than the latest patches, we need to ensure that abnormalities and errors
are properly handled and prove that commit performance can be improved,
e.g. if we can commit not in serial but also possible in parallel.
And if possible, although not necessary during the first step, it may put at ease
the other reviewers if can we also think of the image on how to implement atomic
visibility on postgres_fdw.
Thoughts?

Regards,
Kirk Jamison

#294Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: k.jamison@fujitsu.com (#293)
Re: Transactions involving multiple postgres foreign servers, take 2

Hi,

On Thu, Oct 7, 2021 at 1:29 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:

That said, if we're going to initially support it on postgres_fdw, which is simpler
than the latest patches, we need to ensure that abnormalities and errors
are properly handled and prove that commit performance can be improved,
e.g. if we can commit not in serial but also possible in parallel.

If it's ok with you, I'd like to work on the performance issue. What
I have in mind is commit all remote transactions in parallel instead
of sequentially in the postgres_fdw transaction callback, as mentioned
above, but I think that would improve the performance even for
one-phase commit that we already have. Maybe I'm missing something,
though.

Best regards,
Etsuro Fujita

#295Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Etsuro Fujita (#294)
Re: Transactions involving multiple postgres foreign servers, take 2

On 2021/10/07 19:47, Etsuro Fujita wrote:

Hi,

On Thu, Oct 7, 2021 at 1:29 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:

That said, if we're going to initially support it on postgres_fdw, which is simpler
than the latest patches, we need to ensure that abnormalities and errors
are properly handled

Yes. One idea for this is to include the information required to resolve
outstanding prepared transactions, in the transaction identifier that
PREPARE TRANSACTION command uses. For example, we can use the XID of
local transaction and the cluster ID of local server (e.g., cluster_name
that users specify uniquely can be used for that) as that information.
If the cluster_name of local server is "server1" and its XID is now 9999,
postgres_fdw issues "PREPARE TRANSACTION 'server1_9999'" and
"COMMIT PREPARED 'server1_9999'" to the foreign servers, to end those
foreign transactions in two-phase way.

If some troubles happen, the prepared transaction with "server1_9999"
may remain unexpectedly in one foreign server. In this case we can
determine whether to commit or rollback that outstanding transaction
by checking whether the past transaction with XID 9999 was committed
or rollbacked in the server "server1". If it's committed, the prepared
transaction also should be committed, so we should execute
"COMMIT PREPARED 'server1_9999'". If it's rollbacked, the prepared
transaction also should be rollbacked. If it's in progress, we should
do nothing for that transaction.

pg_xact_status() can be used to check whether the transaction with
the specified XID was committed or rollbacked. But pg_xact_status()
can return invalid result if CLOG data for the specified XID has been
truncated by VACUUM FREEZE. To handle this case, we might need
the special table tracking the transaction status.

DBA can use the above procedure and manually resolve the outstanding
prepared transactions in foreign servers. Also probably we can implement
the function doing the procedure. If so, it might be good idea to make
background worker or cron periodically execute the function.

and prove that commit performance can be improved,
e.g. if we can commit not in serial but also possible in parallel.

If it's ok with you, I'd like to work on the performance issue. What
I have in mind is commit all remote transactions in parallel instead
of sequentially in the postgres_fdw transaction callback, as mentioned
above, but I think that would improve the performance even for
one-phase commit that we already have.

+100

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#296Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Fujii Masao (#295)
Re: Transactions involving multiple postgres foreign servers, take 2

Fujii-san,

On Thu, Oct 7, 2021 at 11:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2021/10/07 19:47, Etsuro Fujita wrote:

On Thu, Oct 7, 2021 at 1:29 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:

and prove that commit performance can be improved,
e.g. if we can commit not in serial but also possible in parallel.

If it's ok with you, I'd like to work on the performance issue. What
I have in mind is commit all remote transactions in parallel instead
of sequentially in the postgres_fdw transaction callback, as mentioned
above, but I think that would improve the performance even for
one-phase commit that we already have.

+100

I’ve started working on this. Once I have a (POC) patch, I’ll post it
in a new thread, as I think it can be discussed separately.

Thanks!

Best regards,
Etsuro Fujita